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1.0  SUMMARY 

Numerical  and  theoretical  investigations  of  Grover’s  quantum  search  algorithm  were 
investigated.  The  combination  of  coarse  and  fine  grain  parallel  resources  was  explored  as  a 
means  to  utilize  current  massive  multi-processing  capabilities  typically  utilized  for  large  scale 
graphics  rendering  purposes.  The  results,  though  preliminary,  were  encouraging  for  the 
utilization  of  a  hybrid  coarse/fine  grain  approach  for  numerical  simulation  of  quantum 
algorithms.  A  variant  of  Grover’s  algorithm  was  developed  that  explicitly  searches  on  one 
component  of  a  database  to  find  an  associated  element  in  the  complementary  component  of  the 
database.  An  investigation  of  developing  single  photon  quantum  optical  gates  written  into 
centimeter  sized  photo-thennal  refractive  glass  as  volume  holograms  was  explored.  While  not  a 
scalable  technology  in  general,  this  approach  was  shown  to  have  advantage  for  small  qubit 
number  gates,  offering  footprint  savings  over  corresponding  meter-sized  free  space  gates.  A 
simulation  of  an  entangled  Bell  state  photon  pair  topologically  encoded  into  a  cluster  state  was 
carried  out  to  explore  the  advantages  of  the  measurement-based  one-way  quantum  computation 
paradigm.  The  error  threshold  rates  of  approximately  5%  and  8%  that  were  computed  are 
typically  an  order  of  magnitude  higher  than  the  most  promising  error  threshold  rates  obtained  by 
means  of  the  standard  quantum  circuit  model  -  indicating  the  potential  power  of  the  cluster  state 
quantum  computation  paradigm.  The  development  of  a  photon-based  quantum  infonnation 
science  (QIS)  testbed,  construction  and  validation  of  a  group  velocity  matched  (GVM),  and  a 
multipli-entangled  photon  source  crystal  assembly  are  described  herein.  These  crystal 
assemblies  are  constructed  and  investigated  for  the  more  efficient  generation  of  entangled 
photons  as  an  input  source  to  quantum  computing  circuits.  The  GVM  source  was  shown  to 
increase  the  useable  entangled  photon  rate  by  removing  the  spectral  distinguishability,  and  the 
multipli-entangled  photon  sources  increased  the  usable  pairs  by  a  factor  of  six  over  the  single 
pair  typically  produced.  These  two  assemblies  mitigated  the  problems  inherent  in  conventionally 
used  sources. 

2.0  INTRODUCTION 

Under  this  in-house  project  we  constructed  a  six  qubit  (quantum  bit)  capable  photon-based 
experimental  testbed  and  explored  topics  related  to  both  theoretical/numerical  simulation  and 
experimental  investigations  of  quantum  computation.  These  investigations  included: 
theoretical/numerical  simulation  -  (i)  a  hybrid  coarse/fine  parallel  simulation  of  Grover’s 
quantum  search  algorithm;  (ii)  the  development  of  a  variant  of  Grover’s  algorithm  that  explicitly 
searches  on  one  component  of  a  database  to  find  an  associated  element  in  the  complementary 
component  of  the  database;  (iii)  the  design  of  quantum  optical  gates  by  means  of  volume 
holography  in  photo-thermal  refractive  (PTR)  glass,  and  (iv)  an  investigation  of  the  advantage  of 
utilizing  cluster  states  for  a  one-way  quantum  computational  paradigm;  experimental  -  (v)  the 
construction  of  an  advanced  quantum  information  science  testbed  for  development  of  photon- 
based  quantum  gates  and  circuits;  (vi)  the  construction  and  validation  of  a  group  velocity 
matched  (GVM)  temporal  compensator  crystal  assembly  to  increase  the  usable  range  of 
entangled  photon  sources,  and  (vii)  the  development  and  characterization  of  a  new  multipli- 
entangled  photon  source  that  increased  the  usable  number  of  photon  pairs  by  a  factor  of  six. 
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Theory/numerical  simulations: 

Grover’s  search  algorithm  (GSA)  serves  as  an  important  prototypical  benchmark  for  many 
numerical  simulation  efforts  of  quantum  algorithms  [Grover97,  Walther05].  In  one  of  the 
simulation  portions  of  the  research  we  investigated  the  use  of  hybrid  coarse  and  fine  grain 
parallelism  to  numerically  simulate  GSA.  The  goal  was  to  investigate  the  use  of  conventional 
distributed  computation  utilizing  MPI  (Message  Passing  Interface)  on  a  parallel  cluster,  whose 
CPUs  also  had  access  to  multi-core  GPU  (graphics  processor  units). 

Grover’s  oracle  based  unstructured  search  algorithm  is  often  stated  as  “given  a  phone  number  in 
a  directory,  find  the  associated  name.”  More  formally,  the  problem  can  be  stated  as  “given  as 
input  a  unitary  black  box  Uf  for  computing  an  unknown  function  f.{ 0,1}"  —+{0,1  (find  x=xo  an 
element  of  {0,1}"  such  that  /(xo)  =1,  (and  zero  otherwise).”  The  crucial  role  of  the  externally 
supplied  oracle  Uf  (whose  inner  workings  are  unknown  to  the  user)  is  to  change  the  sign  of  the 
solution|x0) ,  while  leaving  all  other  states  unaltered.  Thus,  depends  on  the  desired  solution  xo. 
We  developed/simulated  an  amplitude  amplification  algorithm  in  which  the  user  encodes  the 
directory  (e.g.  names  and  telephone  numbers)  into  an  entangled  database  state,  which  at  a  later 
time  can  be  queried  on  one  supplied  component  entry  (e.g.  a  given  phone  number  to)  to  find  the 
other  associated  unknown  component  (e.g.  name  xo).  For  N=2"  names  |x)  with  N  associated  phone 
numbers  1 1) ,  perfonning  amplitude  amplification  on  a  subspace  of  size  N  of  the  total  space  of  size 
N2  produces  the  desired  state  xn}|/0)  in  V/V  steps. 

In  this  in-house  project  we  utilized  photon-based  qubits  for  the  development  of  quantum  gates 
and  circuits.  These  qubits  propagated  in  free-space  (routed  into  optical  fibers  for  measurements) 
and  hence  the  quantum  gate/circuit  consisted  of  optical  elements  (beam  splitters,  waveplates, 
etc...)  arranged  on  meter-sized  optical  tables.  In  one  part  of  this  research  we  explored  the 
feasibility  of  using  volume  holograms  to  construct  simple  optical  quantum  gates  in  centimeter¬ 
sized  PTR  glass.  Volume  holography  is  typically  used  today  for  2D  image  storage  utilizing  394 
pixels/pnr,  which  consumes  only  1%  of  the  theoretical  volumetric  storage  density  ( 1  /X  ) 
[BurrO  1].  This  field,  first  introduced  by  Dennis  Gabor  in  1948,  has  been  well  established  ever 
since  the  development  of  the  laser  in  1960.  As  the  emulsion  of  the  hologram  increases  in 
thickness  its  angular  selectivity,  i.e.  its  ability  to  differentiate  the  difference  between  two 
planewaves  separated  by  a  small  angle,  increases  and  it  is  able  under  certain  well-known 
conditions  to  achieve  near  perfect  efficiency  [Goodmann05].  A  hologram  is  considered  a  volume 
hologram  if  the  emulsion  thickness  d  »  A  IX  where  A  is  the  characteristic  period  of  the  index  of 
refraction  of  the  grating,  and  X  is  the  wavelength  of  the  light.  For  our  purposes,  it  is  important  to 
emphasize  that  volume  holography  enables  higher  storage  densities,  and  under  suitable  recording 
configurations  can  achieve  near  perfect  efficiencies.  The  goal  of  this  portion  of  the  research  was 
to  investigate  the  possible  use  of  volume  holograms  in  PTR  glass  to  create  simple  single  photon 
quantum  optical  gates. 

In  the  standard  quantum  circuit  model  (QCM)  paradigm,  quantum  computations  are  executed  by 
successive  unitary  operations  acting  upon  an  initial  quantum  state  composed  of  many  qubits. 
These  unitary  operators  create  entanglement  amongst  the  qubits  through  quantum  interference. 
Entanglement  is  uniquely  non-classical  property  of  quantum  mechanical  systems  in  which  the 
correlations  between  sub-systems  can  be  stronger  than  that  allowed  by  classical  (conventional) 
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computing  systems.  Recently  a  new  alternative  paradigm  for  quantum  computation  has  emerged 
called  one-way  quantum  computation  (OWQC)  [RuassendorfOl],  In  the  one-way  quantum 
computer,  information  is  processed  by  sequences  of  single-qubit  measurements.  These 
measurements  are  performed  on  a  universal  resource  state — the  2D-cluster  state — which  does 
not  depend  on  the  algorithm  to  be  implemented.  The  new  approach  to  quantum  computation  goes 
by  the  collective  name  measurement-based  quantum  computation  (MQC)  [Briegel09].  The 
appeal  of  MQC  is  that  detenninistic  quantum  computation  is  possible  based  on  (i)  the 
preparation  of  an  initial  entangled  cluster  state  followed  by  (ii)  a  temporally  ordered  patter  of 
single  qubit  measurements  and  feed-forward  operations  which  depend  on  the  outcome  of  the 
previously  measured  qubits  [RaussendorfOl].  Our  interests  in  OWQC  is  in  the  utilization  of 
photon-based  cluster  states  as  gates  and  circuits  for  quantum  computation  (see  [Vallone08],  and 
references  therein).  It  has  been  claimed  that  the  use  of  cluster  states  can  substantially  reduce  the 
resource  overhead  in  the  standard  QCM  to  photon-based  quantum  computation. 

Experimental: 

Photons  are  particularly  desirable  for  quantum  infonnation  processing  tasks  since  they  are 
relatively  free  from  environmental  decoherence.  Hence,  they  are  also  essential  for  any  long 
distance  conveyance  of  quantum  information,  and  do  not  require  cryogenic  cooling.  Entangled 
photon  sources  with  the  highest  mode  quality  are  based  on  spontaneous  parametric  down 
conversion  (SPDC).  This  is  a  process  where  laser  pump  photons  are  converted  into  ‘signal’  and 
‘idler’  entangled  pairs  in  nonlinear  (NL)  crystals.  SPDC  in  nonlinear  crystals  has  provided  the 
optical  sources  for  groundbreaking  foundational  and  applications  work  in  quantum  optics  (QO) 
for  the  last  two  decades  [O’Brien07]. 

SPDC  is  an  inherently  inefficient  process,  and  work  based  on  it  is  generally  limited  by  the  net 
signal  level  or  the  number  of  photons  that  can  be  entangled  in  given  applications.  Photon  yield  is 
related  to  laser  power,  which  cannot  be  increased  beyond  the  level  where  higher  order  NL 
contributions  (multi-photon  events)  yield  errors  in  quantum  processing  applications.  This  point 
has  now  been  reached  in  applications  that  require  independent  sources  of  entangled  qubits.  The 
work  addressed  in  this  in-house  project  focused  on  (i)  developing  a  6-qubit  capable  photon-based 
quantum  infonnation  testbed  and  (ii)  developing  new  sources  of  entangled  photons  that  greatly 
increase  process  efficiency,  without  increasing  laser  power,  in  a  regime  where  high  detection 
quantum  efficiency  is  available  -  a  highly  desirable  goal  not  previously  accomplished  in  the 
scientific  community  to  date. 

Experimental  demonstrations  of  entanglement  in  photon  pairs  has  more  recently  become  of 
interest  in  quantum  computational  architectures  that  operate  by  principles  entirely  distinct  from 
those  based  on  classical  physics.  Experiments  such  as  two-photon  interference  in  Hong-Ou- 
Mandel  Interferometers  (HOMI)  [Hong87]  and  most  quantum  cryptography  implementations 
require  only  single  photons,  not  entangled  photons,  and  hence  single  crystals.  Most  other 
quantum  information  experiments  require  multiple  crystal  sources  for  entangled  photons.  It  has 
been  found  that  multi-crystal  sources  of  entangled  pairs  are  not  feasible  with  the  continuous 
wave  (CW)  pump  lasers  that  were  used  throughout  the  original  QO  developments;  short  pump 
pulses  are  essential  for  the  multiple  interference  effects  to  be  realized.  The  temporal-spectral 
information  inherent  in  pulses  however  affects  and  constrains  the  quantum  interference.  The 
effects  must  be  clearly  understood  to  optimize  the  performance  in  practical  applications.  In  this 
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in-house  project  we  directed  our  efforts  to  the  construction  and  validation  of  a  group  velocity 
matched  (GVM)  temporally  compensated  crystal  assembly  to  increase  the  usable  range  of 
entangled  photon  sources,  and  to  the  development  and  characterization  of  a  new  multipli- 
entangled  photon  source  crystal  assembly  that  increased  the  usable  number  of  photon  pairs  by  a 
factor  of  six. 

3.0  METHODS,  ASSUMPTIONS,  AND  PROCEDURES 

3.1.  Hybrid  coarse/fine  parallel  simulation  of  Grover’s  quantum  search  algorithm 

Driven  primarily  by  the  video  gaming  industry’s  need  for  massive  graphics  processing,  the 
programmable  GPU  has  evolved  into  a  computational  workhorse.  With  multiple  cores  driven  by 
very  high  memory  bandwidth,  the  GPU  holds  potential  for  non-graphics  processing  scientific 
computing.  The  main  reason  for  such  optimism  is  that  the  GPU  is  specialized  for  compute¬ 
intensive,  highly  parallel  computation  (exactly  what  graphics  rendering  is  about)  and  therefore  is 
designed  such  that  more  transistors  are  devoted  to  data  processing  rather  than  data  caching  and 
flow  control. 

More  specifically,  the  GPU  is  especially  well-suited  to  address  problems  that  can  be  expressed  as 
data-parallel  computations  -  the  same  program  is  executed  on  many  data  elements  in  parallel  - 
with  high  arithmetic  intensity  (the  ratio  of  arithmetic  operations  to  memory  operations).  Because 
the  same  program  is  executed  for  each  data  element,  there  is  a  lower  requirement  for 
sophisticated  flow  control;  and  because  it  is  executed  on  many  data  elements  and  has  high 
arithmetic  intensity,  the  memory  access  latency  can  be  hidden  with  calculations  instead  of  big 
data  caches.  Data-parallel  processing  maps  data  elements  to  parallel  processing  threads.  Many 
applications  that  process  large  data  sets  such  as  arrays  can  use  a  data-parallel  programming 
model  to  speed  up  the  computations. 

Currently,  AFRL/RI  is  actively  pursuing  large  scale  parallel  scientific  computing.  At  the 
AFRL/RI  Naresky  High  Performance  Computing  Facility  the  main  computational  resource  is  an 
AFRL/RI  500  TFLOP  (July  2010)  integrated  HPC  system  consisting  of  2,016  PlayStation3  (cell 
broadband  engine  processor)  nodes  and  84  x86  servers  each  with  an  nVidia  Tesla  Cl 060  and  an 
nVidia  Tesla  C2050  GPGPU  (general  purpose  graphical  processing  unit). 

The  exploratory  codes  we  developed  to  simulate  Grover’s  quantum  search  algorithm  utilized  a 
combination  of  MPI  libraries  for  conventional  distributed  parallel  communication  between  the 
host  CPUs  and  CUDA  (from  the  company  NVIDIA,  (see  [CUDA07])  which  stands  for  Compute 
Unified  Device  Architecture.  CUDA  is  a  new  hardware  and  software  architecture  for  issuing  and 
managing  computations  on  the  GPU  as  a  data-parallel  computing  device  without  the  need  of 
mapping  them  to  a  graphics  API. 

When  programmed  through  CUDA,  the  GPU  is  viewed  as  a  compute  device  capable  of  executing 
a  very  high  number  of  threads  in  parallel.  It  operates  as  a  coprocessor  to  the  main  CPU,  or  host. 
In  other  words,  data-parallel,  compute-intensive  portions  of  applications  running  on  the  host  are 
off-loaded  onto  the  device.  More  precisely,  a  portion  of  an  application  that  is  executed  many 
times,  but  independently  on  different  data,  can  be  isolated  into  a  function  that  is  executed  on  the 
device  as  many  different  threads.  To  that  effect,  such  a  function  is  compiled  to  the  instruction  set 
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Figure  1.  CUDA  host/device  structure:  the  host  (CPU)  issues  a  succession  of  kernel 
invocations  to  the  device  (GPU).  Each  kernel  is  executed  as  a  batch  of  threads  organized  as 
a  grid  of  thread  blocks. 

of  the  device  and  the  resulting  program,  called  a  kernel,  is  downloaded  to  the  device.  Both  the 
host  and  the  device  maintain  their  own  DRAM,  referred  to  as  host  memory  and  device  memory, 
respectively.  One  can  copy  data  from  one  DRAM  to  the  other  through  optimized  API  calls  that 
utilize  the  device’s  high-performance  Direct  Memory  Access  (DMA)  engines.  This  is  illustrated 
in  Figure  1. 

Grover’s  quantum  search  algorithm  executes  an  unstructured  search  on  a  collection  of  n  qubits 
(quantum  bit,  e.g.  two-level  atomic-level  quantum  system,  two-state  polarization  states  of  a 
photon,  etc...),  which  due  to  the  tensor  space  product  nature  of  composite  quantum  systems, 
represents  N=  2"  chits  (classical  bits).  This  exponential  scaling  of  accessible  infonnation  N  with 
the  linear  number  n  of  the  physical  qubits  is  behind  the  power  and  lure  of  quantum  computation. 
Because  of  the  quantum  superposition  principle,  all  N  bits  can  be  searched  simultaneously.  The 
unstructured  search  problem  is  often  colloquially  stated  as  “finding  a  needle  in  a  haystack,”  or 
“given  a  telephone  number,  find  the  associate  name  in  a  telephone  directory.”  In  a  conventional 
(classical)  unstructured  search  problem,  one  would  need  on  the  order  of  N/2  queries  to  an  oracle 
(returning  a  yes  or  no  answer  to  question  “is  this  the  needle?”  or  “is  this  the  correct  name?”)  to 
find  the  correct  solution  (e.g.  the  “needle”  or  the  “name”).  Utilizing  quantum  parallelism,  the 
GSA  can  find  the  answer  using  only  on  the  order  of  Vn  queries  -  a  quadratic  speedup  over  the 
conventional  algorithm. 

The  quantum  state  \y/)  can  be  represented  as  an  A-dimensional  unit  vector  with  complex  entries 
called  quantum  amplitudes,  |u)  =  lx)-  The  cnlr'cs  °f  the  vector  |<//)  represent  the  N 

possible  states  labeled  (decimally)  as  \xj  =  {0,1,..., N-l}  .  The  squared  amplitude  |cx|2  gives  the 
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probability  that  the  state  |^)will  be  found  in  (“collapse”  to)  the  component  state  after  the 
execution  of  a  physical  measurement.  Quantum  operations  act  upon  |^)by  means  of  unitary 

matrices  U,  transforming  the  system  to  the  new,  normalized  state  |i//) via  \y/'^  =  U\y/Y  The 

unitary  operator  U  represents  physical  operations  (e.g.  the  illumination  of  atoms/ions  by  lasers, 
application  of  gate  voltages  to  quantum  dots/superconducting  circuits,  passage  of  light  through 
optical  elements,  etc...)  that  must  be  implemented  upon  the  physical  realization  of  the  quantum 

state  vector  \ys) .  This  is  called  the  quantum  circuit  model  of  quantum  computation.  For  the 

study  of  quantum  algorithms,  we  can  abstract  away  concerns  of  physical  realizations  and 
implementations  (though  this  is  of  intense  research  interest  both  experimentally  and 
theoretically). 

In  Figure  2  illustrates  the  successive  action  of  Grover  unitary  iterate  G  =  UimUf  on  an  n= 3  qubit 
state,  corresponding  to  a  search  on  N=2  =8  bits  (adapted  from  [Yanofsky08]).  The  goal  of  the 
GSA  is  to  apply  G  successively  so  that  the  quantum  state  \y/)  is  steered  towards  the  unknown 


•  Case  n=3  qubits,  N=23=8  states,  solution:  x0  =  5  ( f(x0)=1 ,  else  f(x)=0) 
•Initialization 


v'o  =  LM, 


x  =  0 

•  First  Grover  iteration: 


:j),  1,1,  1,1^1,11/ V8 


xft  =  5 


-  c,  =  l/J&  c=-l/J& 

—  Step  1:  Flag  the  result  |  =  [l,  1, 1, 1, 1,  |— lj  1, l]  / a/8 


c  = 


1 


x*-5 
x0  -  o 

c  — >-c  +  2c 


—  Step  2:  Inversion  about  mean  2-v/s  c  -5/(2^) 

|^2)  =  [U,U,k|jU]/(2^) 


=  5 


•First  Grover  iteration : 

-Stepl:  Flag  the  result  |^3)  =  [UU,1,-5,U]/(W8) 


-  Step  2:  Inversion  about  mean 


c  -  c  — »-c  +2c 

2 'fe  X  c  =-ll/(4^/8) 

|  y/, )  =  -[1. 1.1. 1.  l.Pll[  l.l]/(  4^ ) 

xn  =5^  '  /V 


Figure  2.  Outline  of  Grover’s  search  algorithm  on  n=3  qubits  (N=23  =  8  bits). 


“needle”  component  state  labeled  |x0^.  Queries  to  an  oracle  (externally  supplied  to  the 

questioner)  formally  answer  the  yes/no  question  “is  a  given  x  equal  to  xo  .”  In  other  words,  the 
oracle  computes  the  function  fix)  with  results  fixo)=\,  while  f(x^  xo)= 0.  In  Figure  2,  we  choose 
the  “needle”  to  be  the  state  xo= 5. 

The  GSA  begins  as  follows  (see  Figure  2).  We  initialize  the  system  to  equal-amplitude  unbiased 
state  |^0),  where,  in  this  example,  each  amplitude  has  the  value  1/V8.  This  implies  there  is  an 
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equal  probability  of  1/8  to  find  the  system  in  any  state  x)  upon  measurement.  The  Grover  iterate 
G  involves  two  separate  unitary  operations,  applied  in  right  to  left  order,  to  the  quantum  state. 
The  first  unitary  Uf  implements  the  effective  operation  Uf\x)  =  (-\)nx)\x )  on  each  component 
state  x) ,  with  the  net  effect  of  “tagging”  the  solutions  state  |xn)  with  a  minus  sign,  while  leaving 
all  states  |x^x0) unchanged.  The  second  unitary  Uinv  of  the  Grover  iterate  implements  an 
“inversion  about  the  mean”  of  all  the  quantum  amplitudes.  This  implements  the  operation 
cx  2c  -cx  where  c  =  X  N 's  the  average  of  all  the  quantum  amplitudes  cx. 

After  one  application  of  G  we  observe  in  Figure  2  that  the  amplitude  of  the  state  x0)  has 

.  2 

increased  to  five  times  that  of  all  other  states,  implying  that  it  is  25X  more  likely  (  c  cx  )  to  be 

observed  upon  measurement  than  the  remaining  states.  After  a  second  implementation  of  the 

,  2 

Grover  iterate  Figure  2  reveals  that  this  likelihood  has  increased  to  12 IX  (  c  cx  ).  It  can  be 
shown  that  the  optimal  number  of  Grover  iterations  k  to  achieve  maximum  probability  to  observe 
the  solution  state  upon  measurement  is  k  ~  \jt4n /4J  (where  the  notation  z|  denotes  floor(z), 
the  nearest  integer  less  than  z). 

3.2  Grover’s  search  algorithm  with  an  entangled  database  state 

Grover’s  search  algorithm  [Grover97]  is  one  of  the  most  highly  recognized  quantum  algorithms 
(next  to  Shor’s  factorization  algorithm  [Shor94]),  being  widely  taught  in  many  texts  on  quantum 
computation  [Kaye07,  Yanofsky08]  and  serves  as  a  benchmark  for  nascent  physical 
implementations  of  quantum  computers  [Walther05],  Formally,  Grover’s  search  algorithm  (GSA) 
considers  the  following  scenario  [Boyer96],  suppose  you  have  a  large  table  T[0..  JV-1]  of  N 
entries  for  which  you  would  like  to  find  some  element  zq.  More  precisely,  you  wish  to  find  an 
integer  xo  such  that  0  <  xq  <  N  and  T[x0]=  zq,  provided  that  such  an  xo  exists.  If  the  table  is  sorted 
the  problem  can  be  solved  in  a  time  0(1  og  N).  However,  in  many  interesting  problems,  ordering 
or  structuring  the  data  may  not  be  possible  or  practical,  and  one  must  resort  to  the  brute  force 
method  of  exhaustively  searching  through  all  the  data  until  the  result  is  found  (or  to  detennine  if 
it  even  exists).  Classically,  there  is  no  algorithm  that  succeeds  with  probability  greater  than  Vi 
without  searching  through  more  than  half  the  entries  of  T.  Grover  [Grover97]  described  his 
algorithm  as  finding  a  needle  in  the  haystack,  and  equivalently  as  finding  the  associated  name  in 
a  telephone  book  when  one  is  supplied  with  a  given  telephone  number  (in  which  the  telephone 
book  is  sorted  on  the  names,  but  random  on  the  telephone  numbers).  Grover’s  quantum 
unstructured  search  algorithm  can  solve  this  problem  on  a  quantum  computer  in  expected  time  in 
0(^N).  The  GSA  has  also  been  shown  to  be  optimal  [Boyer96],  implying  that  a  quantum 
algorithm  cannot  achieve  faster  than  a  quadratic  speedup  over  its  classical  counterpart. 

The  GSA  utilizes  an  oracle,  which  computes  a  function  f[x)  of  the  input  jc,  but  whose  inner 
workings  are  unknown  and  unavailable  to  the  user.  The  Grover  search  problem  can  be  stated 
formally  [Kaye07]  as 

The  Grover  Search  Problem 

Input:  A  black  box  (oracle)  t/for  computing  an  unknown  function/:  {0,1}"  —>{0,1}. 
Problem:  Find  an  input  xoe{ 0,1}"  such  that /Go)  =1  and/(x/vo)  =0. 
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In  the  above,  /  is  the  classical  function  which  evaluates  to  “yes”  on  the  needle  and  “no”  on  the 
more  abundant  pieces  of  hay  in  the  haystack.  Uf  is  the  unitary  representation  of /  which  acting  on 
x  encoded  into  a  quantum  multi-qubit  state  |  x } ,  perfonns  the  reversible  operation 

t//|x)®|y)  =  |x)®|y®/(x)},  (1) 

where  |  y)  is  a  single  auxiliary  qubit  and  ©denotes  binary  (mod  2)  addition.  (Note:  from  now  on 
we  will  often  write  the  tensor  products  of  state  |  x)  ®  |  y)  simply  as  |  x)  \  y )  =  \  x,  y) ). 

As  is  well  known,  and  as  will  be  explicitly  illustrated  below,  Uf  requires  knowledge  of  the 
solution  xq  in  order  to  be  explicitly  constructed  [Yanofsky08].  This  is  why  the  oracle  Uf  is  part  of 
the  input  to  the  GSA,  and  it  has  to  be  supplied  externally  to  the  user  performing  the  search. 
Recently,  there  has  been  interest  in  developing  algorithms  that  would  dispense  with  the  Grover 
oracle  Uf  and  encode  the  search  list  directly  into  a  quantum  database  state  which  can  be  initially 
constructed  (e.g.  an  encoding  of  a  telephone  book),  and  subsequently  searched  at  a  later  time 
(e.g.  given  a  telephone  number,  find  the  associated  name).  Xu  et  al.  [Xu08]  have  designed  such 
an  O(ViV)  algorithm  based  on  adiabatic  quantum  computing  (AdQC)  and  experimentally 
demonstrated  its  operation  on  a  two  qubit  “telephone  book”  in  an  NMR  quantum  computer.  In 
their  work,  only  the  names  were  encoded  into  the  quantum  database  state,  while  the  telephone 
numbers  were  encoded  as  classical  integers.  The  goal  of  our  work  was  to  enunciate  a  quantum 
search  algorithm  (QSA),  analogous  in  spirit  to  Xu  et  al.  [Xu08],  but  in  the  usual  quantum  circuit 
model  paradigm  (i.e.  an  explicit  unitary  operator  approach  vs  the  Hamiltonian  approach  of 
AdQC). 

The  Grover  iteration  and  the  phase  kickback  or  solution  tagging  operation 

The  essential  operation  in  Grover’s  algorithm  is  the  Grover  iteration  [Kaye07] 


O-V^v,  (2) 

composed  of  two  functionally  distinct  unitary  components  (i)  Uf,  the  phase  kickback  (PK)  or 
“solution  tagging”  operation,  and  (ii)  U  ± ,  the  inversion  about  the  mean  (IAM).  The  phase 
kickback  unitary  operation  works  as  follows 


Uf 


-)  =  £/, 


l-v}®1°yCw< 


|o© /(*))- 1 1®/G)) 

J2 


=(-ir» 


(3) 


where  we  have  used  (1)  with  |  v)  =  //|l)  =  |-)  =  (|o)-|l))/V2  with  the  Hadamard  operator 
4/2  which  also  maps  //|o)  =  |+)  =  (|0)  +  |l))//2  .  In  the  first  equality  of  (3),  Uf  acts  on 


H  = 


1  1 
1  -1 


both  tenns  of  | y)  simultaneously  (quantum  parallelism).  Explicitly  working  out  both  cases 


J[xq)=\  and/(x/xo)=0  yields  two  results  that  can  be  encapsulated  into  the  single  statement  given 
by  the  rightmost  expression.  This  is  the  famous  phase  kickback  [Grover97,  Kay07,  Yanofsky08] 
in  which  the  evaluation  of  the  function /is  stored  in  the  quantum  phases  e'°  (here,  with  0  G  {0,7t}). 
Since  the  single  auxiliary  qubit  |y)  =  |-)is  returned  to  its  initial  state  after  the  PK  operation,  one 
often  abbreviates  (3)  to 
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with  the  understanding  that  the  required  auxiliary  qubit  |  y)  is  implied.  The  net  result  of  Uf  is  that 
the  sought  after  solution  state  |x0)  is  flagged  with  a  -1,  while  all  other  states  |x  *  x0)  are 
unchanged.  An  explicit  matrix  realization  (Yanofsky08)  of  the  unitary  operator  U f  is  shown  in 

Fig.  3  for  the  case  of  two  qubits  with  solution  state  xo  =  2  in  the  decimal  representation, 
corresponding  to  10  in  the  binary  representation.  (From  now  on  we  will  primarily  use  the 
decimal  representation  x£{0,l,...,N-l}  on  n  qubits,  where  N=2").  Figure  3  illustrates  the 
assertion  that  one  needs  to  know  the  solution  xo  in  order  to  construct  the  oracle  Uf . 


x  =  0  x=l  x=  2  x  =  3 
v  =  0  1  y  =  0  1  >=0  1  >’  =  0l 


Figure  3.  Explicit  construction  of  the  unitary  phase  kickback  operator  UJ  for  the  case  of  two  qubits. 

Figure  3  illustrates  the  explicit  construction  of  the  unitary  phase  kickback  operator  UJ  for  the 

case  of  two  qubits  labeled  by  the  decimal  x£{0,1,2,3}  (<->-{00,01,10,11},  binary)  representation. 
In  this  example,  the  solution  state  is  xo  =  2  (binary,  10),  Xo  denotes  the  2x2  Pauli  ax  bit-flip 
matrix  and  F  denotes  the  2x2  unit  matrix.  The  superscript  xy  on  UJ  denotes  that  the  PK 
operation  acts  upon  the  3-qubit  state  | x)  ®  \  y) ,  where  y  is  a  single  qubit  auxiliary  state. 

In  Fig. 3,  the  x=  xo=2  diagonal  block  of  UJ  contains  the  2x2  Pauli  bit-flip  matrix  denoted  as  36 
which  flips  the  single  auxiliary  y  qubit.  All  other  x^x0  diagonal  blocks  UJ  contain  the  2x2 
identity  matrix,  denoted  as  F,  which  leaves  the  y  qubit  unaltered.  The  superscript  xy  on  UJ 
denotes  that  the  PK  operation  acts  upon  the  3-qubit  states  |x)®|y),  and  the  net  effect  is  to 
multiply  the  state  |x0)®|~)  by  the  phase  factor  -1,  leaving  all  other  states  unaltered.  Since  there 
are  N=  4=2  qubits  in  this  example,  there  are  4  block  diagonals  in  which  to  place  the  bit-flip 
operator  X2.  The  choice  of  which  specific  diagonal  block  is  Xi  placed  is  determined  by  the 
solution  state  xo.  Formally,  the  PK  operation  has  the  form  UJ  =  |x0)(x0 1® |x)(x|<8>/2’  that 

explicitly  illustrates  this  point.  Thus,  the  construction  of  the  PK  operator  requires  knowledge  of 
the  solution  state  xo.  This  is  the  primary  reason  why  L/"  is  given  as  an  “input”  to  the  GSA,  and  is 
considered  as  an  externally  provided  oracle. 
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The  inversion  about  the  mean  operation 


The  second  unitary  in  the  Grover  iterate  (2)  is  the  inversion  about  the  mean  operation,  given  by 


U  ,  =  H  U  ±H'S 
0 


(5) 


where  H®"  takes  the  n- qubit  initial  state  |o)  to  the  unbiased,  equal  amplitude  product  state 

II  o’ // 1 0) .  (the  n-fold  tensor  product  of  tf|0).of  all  qubits).  For  n-qubits,  we 
will  denote  this  for  simplicity  as 

tf|0)  =  |!/>.  (6) 


The  operator  U()l  is  defined  as 


t/„.|o)  =  |o>. 

UQ±  \x  *  0^  =  -|x  ^  0). 


(V) 


Note  that  t/fl±  does  not  require  knowledge  of  the  solution  state  xo;  it  simply  flips  the  sign  of  all 
states  except  the  standard  initial  state  x=0.  U  L  therefore,  has  the  representation 
t/0±  =  1 0)  (0 1  —  1 1)  (l  |  — . . .  —  |  iV — l)  (2V — 1 1  =  2 1 0)  (0 1 — Tjy  where  we  have  used  the  completeness  relation 
IN  =  X1>}(a-|  ,  with  IN  being  the  NxN  unit  matrix.  Thus,  using  (6)  we  can  express  U  ±  in  (5)  as 


=2k)(^|-4  •  (8) 

A  straightforward  calculation  [Kay07,Yanofsky08]  reveals  that  U  ±  maps  the  amplitudes  cv  of 
arbitrary  quantum  state  | cp)  =  XvcJx)  according  to  cx  2c -cx  where  c  =  '^JxcjN is  the  average  of 
all  the  quantum  amplitudes  cx.  This  is  easily  seen  since  A vg  =  | ///)(///  is  the  matrix  with  each  entry 
taking  the  value  1/A,  that  maps  an  arbitrary  quantum  vector  |^)to  a  vector  whose  every 
component  is  c  .  Thus,  U  performs  an  inversion  of  each  quantum  amplitude  cx  about  its  mean 

value  c.  It  has  been  shown  that  after  k  ~  \ji4~N /4J  successive  Grover  iterations  the  state 
| y/a)^j  =  Gk  | y/)  reaches  maximal  probability  to  be  in  the  state  |x0) . 


3.3  Quantum  computing  in  a  piece  of  glass  using  volume  holograms 

Volume  holography  is  used  today  for  2D  image  storage  utilizing  394  pixcls/pirf ,  which 
consumes  only  1%  of  the  theoretical  volumetric  storage  density  (1/k3)  [BurrO  1]  and  this  field  that 
was  first  introduced  by  Dennis  Gabor  in  1948  has  been  well  established  ever  since  the 
development  of  the  laser  in  1960.  As  the  emulsion  of  the  hologram  increases  in  thickness  its 
angular  selectivity,  i.e.  its  ability  to  differentiate  the  difference  between  two  planewaves 
separated  by  a  small  angle,  increases  and  it  is  able  under  certain  well-known  conditions  to 
achieve  near  perfect  efficiency  [Goodmann05],  A  hologram  is  considered  a  volume  hologram  if 
the  emulsion  thickness  d  »  A 2  A.  where  A  is  the  characteristic  period  of  the  index  of  refraction  of 
the  grating,  and  X  is  the  wavelength  of  the  light.  It  is  important  for  our  purposes  to  emphasize 
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that  volume  holography  enables  higher  storage  densities,  and  under  suitable  recording 
configurations  can  achieve  near  perfect  efficiencies. 

The  transmission  volume  holograms  we  consider  are  formed  when  a  "signal"  wave,  (r\S) 
-  A(r)e"s>il  ]  is  directed  into  a  holographic  emulsion  and  made  to  coherently  interfere  with  an 
oblique  "reference"  planewave,  (f  |f?)  as  illustrated  in  the  left  diagram  of  Fig.  4  for  N=3.  In  the 
figure  the  "signal"  wave  is  a  superposition  of  N planewaves, 

i=l  i=l 

where  a;  are  pure  phase  angles.  Here,  we  only  consider  planar  reference  waves,  and  the  signal 
state  as  the  superposition  of  planewaves.  Ordinarily  the  signal  waves  will  have  variable  phase 


Figure  4.  Recording  (left)  and  reconstruction  (right)  by  a  volume  hologram  transmission  grating. 

and  amplitude  modulations.  After  the  hologram  is  developed,  and  if  we  direct  the  identical 
signal  wave  (r|-S^  into  the  hologram,  then  for  a  perfectly  tuned  hologram,  the  reference 

planewave  should  emerge  as  illustrated  in  the  right  diagram  of  Fig.  4.  If  the  photo- 

thermal  refractive  (PTR)  is  not  tuned  to  the  correct  length,  other  diffracted  orders,  e.g.  modes 
parallel  to  the  signal  states,  may  emerge. 

In  Fig.  4  the  left  diagram  shows  a  recording  of  a  volume  transmission  grating  by  the  coherent 
superposition  of  a  plane  reference  wave  |  Rt )  and  a  linear  superposition  of  three  signal  waves 

\S}  =  e‘a'  1 )  +  e1"2  S2  ''j  +  e“'  |  A  'j  +  e'“4 1  S4  ) .  On  the  right  we  show  the  function  of  the  hologram. 

If  the  identically  oriented  signal  wave  |  -S')  is  sent  into  the  hologram  then  the  reference  wave,  \R{) 

will  be  reconstructed  in  the  diffraction.  The  diffraction  pattern  will  ordinarily  consist  of  higher 
order  diffracted  modes  parallel  to  the  signal  state.  However,  for  a  suitably  tuned  volume 
hologram  perfect  efficiency  can  be  achieved,  as  shown  in  the  right  diagram  of  Fig.  4  [Millerlla, 
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b],  This  is  why  we  constrained  the  signal  wave  components  to  a  cone  of  half  angle  9S  centered 
on  the  nonnal  to  the  hologram  face. 

Recently  we  have  shown  [Miller  11a]  that  near  perfect  efficiencies  can  be  obtained  if  (1)  the 
hologram  thickness  is  tuned  to  its  optimal  thickness,  (2)  if  the  each  of  the  signal's  Fourier 
wavevectors  have  the  same  projection  onto  the  normal  to  the  hologram  surface,  i.e.  they  all  lie  on 
a  cone  with  half  angle  9S  as  shown  in  Fig.  4  and  (3)  each  of  the  reference  planewaves  lie  on  their 
own  distinct  cone  concentric  with  the  first,  with  half  angle  9r  and  centered  on  the  nonnal  to  the 
hologram  face.  We  have  also  considered  multiplex  holograms  wherein  multiple  independent 
exposures  are  made  within  the  holographic  emulsion  before  it  is  developed.  We  demonstrated 
using  coupled-mode  theory  that  if  the  signal  waves  {Si}i=(i,2,  ..jv)  form  an  orthogonal  set  under 
the  L2  nonn  in  the  plane  perpendicular  to  the  waves  propagation  direction  (z),  i.e. 

(s,  |  Sj  )  =  |  S*  (x,  y)Sj  (x,  y)dxdv  =  Stj , .  (19) 

then  perfect  efficiency  can  be  achieved  for  each  of  the  signals  [Miller  1  la], 

A  volume  multiplexed  hologram  that  has  achieved  perfect  efficiency  (within  coupled-mode 
theory  [Kogelnik69]  under  the  "3+1”  conditions  outlined  above  provides  a  linear  map  between 
signal  and  reference  modes.  Physically  it  represents  a  projection  (or  redirection)  operator  or 
signal  state  sorter  [Miller  11  a, b] 

HA(W|*2><s2|+-+K}<s„|.  no 

uniquely  identifying  each  pair  of  signal  and  reference  waves.  Although  the  index  of  refraction 
within  the  emulsion  can  be  rather  complicated,  these  devices  are  strictly  linear  optical 
components.  Therefore,  the  diffraction  patterns  for  a  beam  of  photons  will  correspond  exactly  to 
the  probability  distribution  for  a  single  photon  in  the  beam.  In  our  work  we  assumed  that  we 
were  dealing  with  low  number  Fock  states.  In  section  4.3  we  show  how  this  theory  can  be 
applied  to  develop  a  CNOT  gate  using  stacked  holograms  in  PTR  glass. 


3.4  Cluster  state/one-way  quantum  computation 

As  a  focus  for  our  experimental  efforts  in  QIS  we  initiated  an  investigation  into  the 
utility/feasibility  of  measurement-based  quantum  computation  (MQC)  as  a  computing  paradigm 
[Briegel99].  MQC  also  goes  by  the  name  one-way  quantum  computation  (OWQC)  or  cluster 
state  quantum  computation  (CSQC)  (see  [RuassendorfOl])  because  the  computation  is  driven  by 
irreversible  measurements  performed  on  a  large  scale  entangled  resource  state,  rather  than  by  a 
sequence  of  reversible  unitary  gates  in  the  usual  quantum  circuit  model  (QCM).  The  initial 
entanglement  resources  of  the  OWQC  are  called  graph  states  (in  general),  or  cluster  states  (a 
graph  state  arranged  as  a  two  or  three  dimensional  regular  grid).  The  appeal  of  MQC  is  that 
deterministic  quantum  computation  is  possible  based  on  (i)  the  preparation  of  an  initial  entangled 
cluster  state  followed  by  (ii)  a  temporally  ordered  patter  of  single  qubit  measurements  and  feed¬ 
forward  operations  which  depend  on  the  outcome  of  the  previously  measured  qubits 
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[RaussendorfOl].  Our  interest  in  OWQC  is  in  the  utilization  of  photon-based  cluster  states  to 
develop  gates  and  circuits  for  quantum  computation  (see  [Vallone08],  and  references  therein). 

In  contrast  to  the  quantum  circuit  model,  where  quantum  computations  are  implemented  by 
unitary  operations,  in  the  OWQC  approach  infonnation  is  processed  by  sequences  of  single-qubit 
measurements.  These  measurements  are  perfonned  on  a  universal  resource  state — the  2D-cluster 
state — which  does  not  depend  on  the  algorithm  to  be  implemented.  A  one-way  quantum 
computation  proceeds  as  follows:  (i)  A  classical  input  is  provided  which  specifies  the  data  and 
the  program;  (ii)  A  2D-cluster  state  of  sufficiently  large  size  is  prepared.  The  cluster  state  serves 
as  the  resource  for  the  computation;  (iii)  A  sequence  of  adaptive  one-qubit  measurements  is 
implemented  on  certain  qubits  in  the  cluster.  In  each  step  of  the  computation  the  measurement 
bases  depend  on  the  specific  program  under  execution  and  on  the  outcomes  of  previous 
measurements.  A  simple  classical  computer  is  used  to  compute  which  measurement  directions 
have  to  be  chosen  in  every  step;  (iv)  After  the  measurements  the  state  of  the  system  has  the 

product  form  |  J  |  ,  where  a  indexes  the  collection  of  measurement  outcomes  of  the 

different  branches  of  the  computation.  The  states  y/“ut  ^  in  all  branches  are  equal  to  the  desired 

output  state  up  to  a  local  (Pauli)  operation.  The  measured  qubits  are  in  a  product  state  which 

also  depends  on  the  measurement  outcomes.  The  OWQC  is  computationally  universal,  i.e.  even 
though  the  results  of  the  measurements  in  every  step  of  the  computation  are  random,  any 
quantum  computation  can  deterministically  be  realized.  Notice  that  the  temporal  ordering  of  the 
measurements  plays  an  important  role  and  has  been  formalized  as  a  feed-forward  procedure 
[RaussendorfOl]. 

In  realistic  physical  systems  decoherence  tends  to  make  quantum  systems  behave  more 
classically.  One  could  therefore  expect  that  decoherence  would  threaten  any  computational 
advantage  possessed  by  a  quantum  computer.  However,  the  effects  of  decoherence  can  be 
counteracted  by  quantum  error  correction  [Shor96].  In  fact,  arbitrarily  large  quantum 
computations  can  be  performed  with  arbitrary  accuracy  provided  the  error  level  of  the 
elementary  components  of  the  quantum  computer  is  below  a  certain  threshold.  This  important 
result  is  called  the  threshold  theorem  of  quantum  computation  [Aliferis06]. 

Fault-tolerant  schemes  for  OWQC  using  photons  have  recently  been  developed  [Dawson06, 
Vamava06].  The  dominant  sources  of  error  in  this  setting  are  photon  loss  and  gate  inaccuracies. 
The  constraint  of  short-range  interaction  and  arrangement  of  qubits  in  a  2D  lattice — a 
characteristic  feature  of  the  initial  one-way  quantum  computer — is  not  relevant  for  photons.  In 
[Dawson06]  both  photon  loss  and  gate  inaccuracies  were  taken  into  account  yielding  a  trade-off 
curve  between  the  two  respective  thresholds.  Fault-tolerant  optical  computation  is  possible  for  a 
gate  error  rate  of  10'4  and  photon  loss  rate  of  3x10°.  In  [Varnava06]  the  stability  against  the  main 
error  source  of  photon  loss  was  discussed.  With  non-unit  efficiencies  qs  and  qD  of  photon 
creation  and  detection  being  the  only  imperfections,  the  very  high  threshold  of  qshD  >  2/3  was 
established.  Further,  encoding  a  collection  of  physical  qubits  within  the  2D  cluster  state  offers  a 
means  of  topological  error  protection  for  the  logical  qubit.  Topologically  protected  quantum 
gates  are  perfonned  by  measuring  some  regions  of  qubits  in  the  Z-basis,  which  effectively 
removes  the  qubits  from  the  state.  The  remaining  cluster,  whose  qubits  are  measured  in  the  X- 
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and  X  ±Y  -basis,  thereby  attains  a  non-trivial  topology  in  which  fault-tolerant  quantum  gates  can 
be  encoded.  A  topological  method  of  fault-tolerance  for  OWQC  can  then  be  achieved 
[Raussendorf07].  In  the  work  investigated  here  we  numerically  studied  the  evolution  and 
topological  protection  of  a  maximally  entangled  Bell  state  pair  from  an  initial  2D  plane  to 
terminal  2D  plane  in  a  3D  rectangular  cluster  state. 

3.5  Quantum  information  science  testbed 

To  perform  quantum  information  experiments  a  testbed  was  constructed  to  generate  photon 
based  quantum  bits.  These  polarization-entangled  photons  were  generated  via  the  process  of 
spontaneous  parametric  down  conversion  (SPDC).  This  involves  a  source  of  light  with  a 
characteristic  higher  energy  (i.e.  “blue”  light)  spontaneously  splitting  into  two  correlated  photons 
of  lower  energy  (i.e.  “red”  light). 


Figure  5.  Type-II  SPDC  photon  source  (left)  and  resulting  (unnormalized)  entangled  photon 
polarization  state.  In-house  laboratory  images  (right)  showing  SPDC  ring  evolution  with  the 
variation  of  the  crystal  orientation. 

Energy  and  momentum  are  conserved  in  the  process  so  the  energies  and  directions  of 
propagation  of  these  photon  pairs  are  correlated.  The  polarization  of  the  light  is  an  additional 
parameter  that  can  be  correlated.  There  are  two  predominant  forms  of  SPDC.  In  a  single  type-II 
crystal,  the  pair  of  photons  emerges  with  orthogonal  polarizations  on  two  spatially  separate  cones 
(Fig.  5)  due  to  the  birefringence  of  the  crystal.  In  a  single  type-II  crystal  the  photons  emerge  on  a 
single  cone  (diametrically  opposed)  with  the  same  polarization.  In  practice,  two  type-I  crystals 
are  used  to  produce  two  overlapping  cones  of  two  distinct  polarization  (Fig6).  For  both  types  of 


900 


Figure  6.  Type-I  pair  SPDC  photon  source  (left)  and  resulting  (unnormalized)  entangled 
polarization  state.  In-house  laboratory  images  (right)  showing  SPDC  ring  evolution  with 

the  variation  of  the  crystal  orientation. 

APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 

14 


SPDC,  regions  where  the  cones  overlap  are  potential  candidates  for  extracting  polarization- 
entangled  photon  pairs.  These  sources  are  described  in  greater  detail  in  the  following  paragraphs. 
These  bulk  crystal  based  photon  sources  are  the  fundamental  basis  on  which  the  testbed  is 
constructed,  with  the  other  main  components  being  the  continuous  wave  (CW)  and  pulsed  pump 
lasers  and  the  single  photon  detectors. 

High  intensity  type-II  SPDC  sources  described  by  Kwiat  [Kwiat95]  served  as  the  first  realizable 
source  for  the  generation  of  entangled  photons.  The  output  of  this  source  is  comprised  of  two 
orthogonally-polarized  entangled  photons  (signal  &  idler)  produced  upon  excitation  from  a 
linearly-polarized  pump  laser  beam.  Due  to  the  inherent  birefringence  of  the  crystal  there  is 
noticeable  signal,  idler  walk  off  which  leads  to  the  familiar  double  ring  pattern  as  illustrated  in 
Fig.  5.  The  intersections  of  the  two  orthogonally-polarized  rings  are  regions  of  photon 
indistinguishability  where  entanglement  occurs.  Variation  of  the  crystal  orientation  changes  the 
size  and  therefore  the  intersection  points  of  the  rings  as  shown  in  Fig.  5.  The  typical  operational 
configuration  is  collinear  or  tangential,  where  the  two  rings  intersect  at  nearly  90°.  This 
produces  a  Gaussian-like  beam  profile  which  gives  a  high  coupling  efficiency  into  optical  fiber. 

Type-I  crystals  have  been  used  for  many  years  as  frequency  converters  for  second  harmonic 
generation  (SHG).  The  signal  and  idler  photons  produced  from  type-I  down-conversion  are  both 
orthogonal  with  respect  to  the  linear  pump  beam.  The  fact  that  the  signal  and  idler  photons  both 
have  the  same  polarization  mitigates  the  walk-off  problem  due  to  the  birefringence  of  the  crystal. 
Varying  the  crystal  orientation  produces  either  a  single  output  cone  or  single  beam  with  respect 
to  the  linear  pump  beam.  Kwiat  first  described  the  use  of  type-I  crystals  as  a  feasible  source  for 
SPDC-generated  entangled  photons  with  the  development  of  the  type-I  pair  design  [Kwiat99]. 
This  consisted  of  a  pair  of  type-I  crystals  rotated  with  their  optic  axes  orthogonal  to  each  other. 
This  allows  for  the  production  of  two  orthogonally-polarized  cones  of  photons  (see  Fig.  6)  which 
overlap  upon  correct  rotation  of  the  crystal.  The  pump  must  also  be  changed  from  purely 
horizontal  or  vertical  polarization  as  for  a  single  type-I  crystal,  to  45°  to  excite  both  crystals. 
Since  signal,  idler  walk-off  due  to  birefringence  is  not  an  issue  in  type-I  crystals  this  source  is 
more  efficient  than  a  type-II  source.  This  is  due  to  the  longer  interaction  length  in  which  the 
photons  remain  entangled  over  the  crystal  length,  thus  allowing  for  longer  crystals.  Further,  in  a 
configuration  in  which  the  two  rings  overlap  photons  along  the  entire  ring  are  indistinguishable 
allowing  for  any  diametrically  opposite  pair  to  be  collected  and  utilized  [DragomanOl].  The 
fundamental  collection  limit  of  this  source  is  governed  by  the  bulk  size  of  the  hardware,  namely 
how  many  apertures  can  be  stationed  in  front  of  the  ring  for  collection  of  the  diametric  pairs. 

Various  other  schemes  have  been  developed  for  increasing  the  useable  output  of  type-II  to  limits 
approaching  that  of  type-I.  Bitton  et.  al.  describe  a  type-II  pair  with  each  crystal’s  optical  axis 
rotated  180°  with  respect  to  each  other;  (see  Fig.  7)  [BittonOl].  This  allows  the  linear  pumping 
scheme  to  remain  unchanged  while  allowing  both  crystals  to  produce  one  set  of  rings  each  with 
the  polarization  orientation  rotated  180°.  In  this  configuration  any  selected  diametric  pair  across 
either  ring  is  indistinguishable  and  useable,  and  the  size  of  the  collection  apertures  becomes  the 
limiting  factor  in  the  number  of  diametric  pairs  that  can  be  collected. 
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Figure  7.  Type-II  entangled  photon  pair  as  described  by  Bitton  et.  al.  [4]. 

U’Ren  et  al.  described  a  type-II  crystal  assembly  (see  Fig.  8)  that  is  designed  for  group  velocity 
matching  (GVM)  of  the  pump  and  signal/idler  wave  packets,  thereby  removing  the  spectral 
distinguishability  of  the  photons  [U’Ren06].  The  symmetric  nature  of  the  joint  spectral  function 
of  the  entangled  photons  produced  from  this  crystal  removes  the  need  for  spectral  filtering  of  the 
down-converted  photons  inherent  to  all  current  SPDC  sources.  This  increases  the  percentage  of 
useable  entangled  photons  produced  from  a  single  type-II  crystal.  This  source  will  be  described 
in  greater  detail  in  sections  3.6  and  4.6. 


Figure  8.  Type-II  custom  assembly  showing  alternating  BBO  (red)  and  calcite  (blue)  segments. 

With  an  ever  increasing  need  for  larger  numbers  of  entangled  photon  pairs,  new  sources  must 
either  produce  more  photons  or  the  efficiency  must  be  increased  to  compensate  for  the 
spontaneous  nature  of  the  source.  A  particular  area  of  interest  where  larger  numbers  of  photons 
are  desired  is  photon-based  cluster  state  quantum  computing  (CSQC).  In  CSQC  individual  pairs 
of  photons  are  entangled  together  to  fonn  larger  arrays  of  entangled  photons.  Typically,  large 
numbers  of  single  pairs  are  generated  by  cascading  or  multi-passing  the  excitation  beam  through 
SPDC  sources  as  shown  in  Fig.  9  [Lu07].  In  a  typical  configuration  each  of  these  sources 
produces  a  single  pair  of  entangled  photons.  Obtaining  a  larger  photon  number  requires  an 
increase  in  the  overall  footprint  size  of  the  experimental  setup. 
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Figure  9.  Experimental  configuration  for  the  generation  of  entangled  photon  cluster  states  |6]. 

3.6  Temporally  compensated  crystal  assembly 

The  outline  of  this  section  is  as  follows:  We  first  describe  multi-crystal  interference,  particularly 
for  type-II  SPDC,  with  key  implications  for  separable  quantum  states.  Next,  the  significance  of 
group  velocity  matching  (GVM)  in  such  states  is  discussed.  Prototypes  of  new  methods  for 
implementing  GVM,  designed,  and  assembled  so  that  initial  spectral  tests  could  be  performed  are 
discussed.  Finally,  brief  mention  is  given  to  how  the  methods  can  be  generalized  to  increase 
control  of  the  SPDC  spectral  function,  to  enable  applications  in  regions  that  have  not  been 
accessible  with  other  methods. 


Consider  amplitudes  thatyield  coincidence  counts  in  the  single 
photon  counting  detectors  (1)1  ,D2);  where  the  events  are: 
a  photon  with  frequency  co1  in  D 1  ,  and  a  photon  with 
frequency  a>2  wD2 . 

Two  paths  to  this  event  via  transmission  Tx-Tx  OR  reflection 
Rx-Rx  areiii  principle  indistinguishable,  and  exhibit  quantum 
interference  by  a  null  in  coincidence  counts,  since  the  photons 
always  go  to  same  detector. 

Note  that  thehigh  spectral  entanglement  in  frequency  yields  no 
distinguishing  information  to  p  ath  events. 

Figure  10.  Hong-Ou-Mandel  interferometer  single  SPDC  source. 

In  an  application  of  photon  entanglement  it  is  essential  to  designate  which  photon  properties 
(momentum,  energy  (spectral),  polarization,  spatial,  or  temporal  etc.)  in  a  given  configuration  are 
to  be  entangled,  and  to  ensure  that  no  others  yield  information  to  degrade  the  desired  quality  of 
interference.  Quantum  interference  relies  on  indistinguishable  amplitudes  (“Feynman  paths”) 
leading  to  an  event.  In  this  case  will  be  photon  pair  detection  in  coincidence  counting  modules. 
To  illustrate  consider  first  the  HOMI  (Fig.  10)  where  two  photons  meet  at  a  beam  splitter  (BS).  If 
the  wave  packets  of  the  two  photons  are  coherent  with  one  another,  they  will  always  exit  the 
same  port  of  the  BS  because  their  probability  amplitudes  cancel  and  lead  to  destructive 
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interference.  Such  a  simplified  single-mode  treatment  based  on  a  photon’s  bosonic  symmetry  is 
sufficient  for  conceptual  analysis,  but  not  to  describe  an  actual  experiment.  SPDC  photons  are  far 
from  single  mode,  even  when  the  pump  beam  is  CW  and  nearly  in  a  single  spectral  mode.  The 
photons  in  SPDC  are  in  fact  emitted  as  wave  packets,  with  finite  spectral  and  temporal 
bandwidths  that  can  be  Fourier  transfonn  limited.  Each  photon  can  exhibit  any  spectral  value 
within  its  envelope.  Thus,  to  explain  the  FIOMI  effect  with  such  wave-packets,  it  must  be  clear 
that  the  spectral  properties  cannot  provide  distinguishing  infonnation  on  the  Feynman  event 
paths.  (Fig.  11) 


CWpuuip  laser -photons:  infinite  length  SPDC  time  of  event  unknown 
But  SPDC  photon  (pairs)  arewave-packetswith  finite  temp  oral-spectral  widths 

► 

CWPump  =>  photons  infinitely  long  in 
time,  with  narrow  delta  function  Entangled  paii  photons  each  have  spectral  and  temporal  width  (eg.  10  fe  / 

spectrum;  0(0 )  50  nm)BUTjoint  spectral  width  is  narrow  due  to  conservation  ofpump 

photon  energy;  So  the  entangled  pair's 

JOINT  SPECTRAL  FUNCTION  =  d(o)  +Cd  ~0))  =  S(v  -F,) 

where  Fp  =  ( d ,  -  -  Cd  ) 

Figure  11.  Photon  wave-packets  generated  with  CW  Pump. 

It  is  emphasized  that  it  is  not  relevant  whether  the  spectral  detections  are  carried  out,  it  matters 
only  that  the  measurements  could  in  principle  be  made;  i.e.  it  is  possible  events  and  not  actual 
ones  that  determine  the  quantum  amplitudes  used  to  calculate  (probabilistic)  experimental 
results.  Spectrally-resolved  single-photon  detection  is  cumbersome  and  seldom  carried  out,  but  it 
could  be  done  using  dispersed  arrays  of  single-photon  counters. 

We  return  to  the  problem  of  critical  interest:  how  to  make  use  of  many  independent  photon 
sources,  essential  to  producing  more  than  two  entangled  photons  or  two  qubits.  A  possible  first 
step  is  to  replace  a  CW  pump  source  with  short  pulses  that  have  a  broad  spectrum  and  well 
defined  pair-creation  time  intervals,  which  can  effectively  overlap  from  many  sources.  This 
approach  enables,  but  does  not  optimize,  the  process  efficiency  and  purity  of  quantum 
interference.  An  analysis  of  distinguishing  information  is  required,  particularly  the  photons’ 
spectral  state  function.  To  eliminate  path  distinction,  spectral  state  information  regarding  one 
photon  state  must  yield  no  identifying  information  regarding  the  other  photon  state.  This  is 
explicitly  shown  by  Grice  [GriceOl],  when  the  two-photon  state  probability  distribution  is 
separable  into  a  product  state  for  each  photon,  i.e.  F{vs,vt)  -  f  (vs)g(v;.)  .  In  that  case  knowledge 

of  the  value  of  vs  provides  no  information  on  the  value  of  vj .  This  contrasts  (Fig.  12)  with  a  CW 
pump  spectral 
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Det  1 


Del  2  Det  3 


Det  4 


Pump 

laser 


SPDC 

Crystals 


QUESTION 

If  photons  (Vj  and  v;)  and  (v3  and  v4)  are  entangled. 

Can  v2  and  v3  be  entangled  after  B . S .  as  in  a  HOMI? 

NewSpectralFunction:  =  5(v/1+v2)6(v3+v'j() 

Problem:  In  this  configuration,  the  spectral  entanglement 
identifies  the  source  (path)  of  each  photon;  such  information 
precludes  the  quantum  path  interference. 

SOLUTION 

A.  Eliminate  the  distinguishing  spectral  information! 

B.  Cannotuse  a  CW  pump  laser;  use  shortpump  pulses, 
having  a  broad  spectrum. 

C.  Then  tailor  the  source  crystal(s)  to  yield  a  JOINT 
SPECTRAL  FUNCTION  with  the  required  factorizable 
properties. 

IrMjj;  dv,dvft(y,  +vi)<p(vsA'l)\v/j\vl) 


Figure  12.  Spectral  distinguish  ability  in  multi-source  entangled  photon  interference. 


function S(vs  +v.) ,  where  knowledge  of  vt  determines  vv  exactly;  this  state  is  not  separable  in 

frequency.  Thus,  the  issue  becomes  how  to  generate  separable  spectral  states  that  can  be  realized 
in  SPDC.  The  most  direct  example  would  be  the  product  of  spectral  bandpass  filters  placed 
before  the  detectors,  to  contribute  a  spectral  response  of  the  form /(vi)g(vi).  However  this  is 
only  realizable  in  practice  if  the  spectral  form  contribution  of  the  pump  photons  and  the  crystal 
contributions  are  neglected  since  the  latter  two  are,  in  general,  not  separable.  However  if  the 
filters  are  sufficiently  narrowband,  their  form  factor  predominates  and  makes  the  (separable) 
Gaussian  filter  product  a  good  approximation  to  the  experimental  distribution. 


Figure  13.  Experimental  configuration  for  the  generation  of  entangled  photon  cluster  states. 
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This  is  indeed  how  nearly  every  multi-source  experiment  (Fig.  13)  to  date  has  achieved  the 
required  separability,  sometimes  without  explicit  awareness  thereof  [Zeilinger05,  Pan07]. 

Unfortunately  the  vast  majority  of  entangled  photons  are  necessarily  discarded  in  this  process. 

2  2 

Note:  circular  symmetry  is  not  related  to  a  separable  state.  In  particular,  CIRC  (x  +  y  )  is 
symmetric,  but  is  not  factorable. 


v.  andv,  aremaximally  entangled, 
the  value  of  v.  determines  that  of  v; . 


gauss[v.]  gauss[V|]  is  seperable;  thus 
the  value  of  v.  yields  no  informal!  on  on  vt 
However  this  function  is  not  physically 
realizable  in  SPDC experiments. 


_  -  (v5+  v-  f  -{vs- y)2 

0-pump  '  ®  ® 

_  -2v«2  -2v> 


Thi  s  spectral  function  i  s  seperable: 
Thereisno  entanglement  between  v. , 
and  this  can  be  experimentally  realized. 


Figure  14.  Joint  spectral  functions  for  CW  pump,  and  BBO  under  broadband  pump  and 
ideal  GVM  case. 

There  is  however  another  way  to  achieve  the  desired  results  without  any  spectral  filtering,  and 
avoid  the  losses  entailed.  It  was  shown  in  [U’Ren05]  that  if  the  crystal  spectral  function  has  a 
particular  fonn  then  its  product  with  the  pump  spectral  function  can  become  separable,  though 
neither  of  the  two  alone  meets  that  condition.  A  simplified  calculation  is  illustrated  (Fig.  14,  and 
Fig.  15). 


Gaussian  Pulse  Pump  Function 

-Kh+v.  f 

a.  =  e 

SPDC  Crystal’s  Phase  Matching  Function  (PMF) 
j  =  Sine  [ vL;  ( vs  •  (k^  k; )  -  vt  •  (k’  -  k’ ) )] 


Approximate  PMF(matching  width  of  Gauss«Sinc) 

.  _  -[yL2(vs.  (kp-ks)  -  Vj-O^-lc;))] 


where  kp  :i  aregroupvelocityparametersof  crystal 
for  pump,  signal,  idler  photons. 


Group  Velocity  Matched 

If  we  let  kp=4(k,+  kj ) 

2 

Then  0GVM  =  e  K  hs  %1 1  and  the 

Spectral Fiuicti on  a-0=e"q*'seq  ‘ 
becomes  seperable  and  symmetric, 

IF  the  crystal  length  L  is  set  such  that 

&«X[zL2(k'1-k;)] 


Figure  15.  Joint  spectral  functions  for  pump  and  ideal  GVM  case. 

Rather  than  the  most  general  case  we  consider  the  central  one  only;  the  exact  group  velocity 
matched  case.  This  means  simply  that  the  crystal’s  dispersive  parameters  are  such  that  the  pump 
pulse  velocity  matches  that  of  the  (type-II)  photon  pair’s  (average)  velocity.  Several  experiments 
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[GriceOl]  were  able  to  demonstrate  such  states  with  selected  nonlinear  crystals  in  the  1.5  um 
regime.  No  known  crystals  enable  GVM  for  applications  at  ~  800  nm  or  shorter  wavelengths, 
where  much  of  the  quantum  optics  work  is  centered,  and  where  photon  detectors  exhibit  the 
highest  quantum  efficiency  (>90%)  without  cryogenic  operation.  Accordingly,  the  focus  of  this 
work  is  to  demonstrate  how  GVM  crystals  at  arbitrary  wavelength  ranges  can  be  “synthesized” 
by  properly  combining  segments  of  known  crystals.  The  physical  implementation  of  a  GVM 
matched  crystal  is  described  in  section  4.6. 

3.7  Entangled  photon  sources 

Standard  type-I  and  type-II  SPDC  crystals  are  still  the  leading  technology  for  the  production  of 
high  mode  quality  photons  used  in  quantum  optics  experiments.  In  these  sources  entangled 
photon  pairs  are  emitted  as  high  energy  pump  photons  pass  through  a  nonlinear  crystal.  Multi¬ 
partite  states  of  four  or  more  entangled  photons  are  generated  by  employing  several  crystals  or 
multiple  passes  through  a  single  crystal  since  typically  only  one  pair  is  produced  per  pass.  Many 
groups  as  well  as  our  in-house  are  striving  to  overcome  this  limitation.  Herein  will  be  described 
our  novel,  compact  multipli-entangled  photon  source  (designated  simply  as  “Schioedtei” 
henceforth)  crystal  for  type-II  SPDC  which  produces  six  pairs  of  photons,  surpassing  the  typical 
generation  of  a  single  pair  of  entangled  photons  per  pass  in  conventional  SPDC-based  sources. 

The  Schioedtei  design  is  an  adaptation  of  a  typical  type-II  SPDC  source.  Schioedtei  consists  of  a 
pair  of  two  type-II  non-collinear  phase-matched  SPDC  crystals  cut  for  degenerate  down- 
conversion  whose  optic  axes  are  rotated  orthogonal  with  respect  to  one  another  as  in  Figure  16. 


Incident  axis 


Figure  16.  Type-II  SPDC  Schioedtei  crystal  assembly. 


When  the  crystal  pair  is  excited  with  an  incident  45°  polarized  pump  beam  one  pair  of  rings  is 
produced  from  each  of  the  type-II  crystals.  Each  pair  of  rings  is  orthogonal  to  the  other  resulting 
in  12  intersection  points  (or  simply  “points”)  where  indistinguishable  photons  are  produced. 
Referring  to  Fig.  17,  the  indicated  points  marked  5,  6  and  7,  8  are  the  typical  Bell  states, 


B) 


!  5,6  (7,8) 


(|//F)56(7g)±e'>|F7/)56(7g)),  with  one  pair  arising  from  crystal  1,  and  the  second 


pair  produced  from  crystal  2.  The  points  indicated  by  1,  2,  3,  4  are  the  product  of  two  bell  states, 
|VP)]  2  3  4  =^(|//F)i4  +  ei(,|k//)i4)  (|//k)23 +e'(’|F//)73),  produced  from  independent  entangled 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 

21 


photon  pairs  emerging  from  crystals  1  and  2  concurrently.  Points  9,  11  and  10,  12  are  |w) 
and  |  HH)l0|2  product  states  produced  from  photons  from  crystal  1  and  2  concurrently.  The 
experimental  implementation,  construction,  and  results  will  be  described  in  section  4.7. 


Polarizaton 
0«|HH) 
O  tt  |W) 
0«Hhy) 


Figure  17.  Type-II  SPDC  Schioedtei  source.  See  text  for  discussion  of  the  intersection 
points  of  the  overlapping  rings. 


4.0  RESULTS  AND  DISCUSSION 


4.1  Grover’s  quantum  search  algorithm:  simulation 

In  Fig.  18  and  Fig.  19  we  illustrate  the  basic  utilization  of  the  device  GPU  compute  cores  within  a 
parallel  MPI  code  running  on  the  host  CPU.  In  Fig.  20  and  Fig.  21  we  illustrate  how  this 
methodology  was  employed  for  a  parallel  Grover  search  algorithm  (GSA)  simulation. 


Host:  C  driver  code 


extern  "C” 
void  run_kernelO 
< 

inti.  arrayl  (ft.  array2[6],  array3[6].  devarrayl.  devarray2.  devarray3: 
for(i-0;l<6;i*+) 

{ 

arrayl  [i]  -  i; 
array2[i]  ■  34; 

) 


cudaMalloc((void  ")  Sdevarrayl.  $izeof(int)'6); 
cudaMalloc((void  “)  &devarray2.  sizeofjint)‘6); 
cudaMalloc((void")  &devarray3.  sizeof(int)'6): 


CUDA 


V 


cudaMemcpyldevanayl. arrayl. sizeof(int)'6.  cudaMemcpyHostToDevice): 
cudaMemepy(davarray2.array2.  slzeof(int)  6.  cudaMemcpyHostToDevice): 

kernek«2. 3»>(devarray1 .  devarray2.  devarray3):  II  Call  Drive  compute  code 


cudaMemcpy(array3.  devarray3,  slzeof(int)'6,  cudaMemcpyDevIceToHost): 


for(i  *0: 1 «  6:  K+) 

{ 

prirrtff ‘%d  ”,  array3[i]); 
> 

prlntffln"): 

cudaFree(devarrayl): 

cudaFree(devarray2): 

cudaFree(devarray3): 

} 


Driver:  CUDA 
M<«mei.cu  compute  code 

finclude  <stdio.h> 

_ global _ void  kernel(int  'arrayl.  int  *array2,  int  *array3) 

< 

Int  Index  -  blockldx.x  '  blockDIm.x  +  threadldx.x: 
array3[lndex]-  arrayl  pndex]  ♦  array2pndex]: 

> 


Figure  18.  Host  C  drive  code  and  GPU  device  compute  code. 

The  example  we  consider  is  the  addition  of  two  arrays  (of  length  six  for  purposes  of  illustration) 
such  that  there  sum  is  equal  to  three  for  each  entry.  In  Figure  18  the  host  C  code  (running  on  the 
CPU)  creates  two  arrays  arrayl  and  array2  with  values  {0,1, 2, 3, 4, 5}  and  {3,2, 1,0, -1,-2}, 
respectively.  Once  these  arrays  are  fdled,  memory  for  device  arrays  devarrayl,  devarray2  and 
devarray3  are  allocated  with  the  CUDA  command  cudaMalloc  (where  devarray3  will  hold  the 
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sum  of  the  first  two  arrays).  The  command  cudaMemcpy  copies  the  contents  of  the  host  arrays 
arrayl  ( array 2)  into  the  device  arrays  devarrayl  ( devarray2 )  using  the  directive 
cudaMemcpyHostToDevice.  The  kernel  command  then  calls  the  device  compute  code  kernel. cu 
running  on  the  GPU.  Because  our  system  had  two  GPUs  per  CPU,  the  code  kernel«<2  initiates 
two  invocations  of  the  compute  code  kernel. cu,  one  on  each  GPU.  Note  that  each  element  of  the 
arrays  arrayl  and  array2  is  sent  to  a  single  core,  where  the  compute  code  kernel. cu  (shown  in 


Host:  MPI  wrapper/driver  code 


II  mpi.c 

#include<mpi.h> 

void  run_kernel(); 

int  main(int  argc,  char  “argvQ) 

{ 

int  rank,  size; 

MPIJnit  (&argc,  &argv);  T  starts  MPI  7 

MPI_Comm_rank  (MPI_COMM_WORLD,  &rank);T  get  current  process  id  7 
MPI_Comm_size  (MPI_COMM_WORLD,  &size);f  get  number  of  processes  7 


Host:  MPI  output 


II  output 

$  mpirun  -I  -np  10  ./mpicuda 

1:333333 

9:333333 

8:333  333 

2:333  333 

7:3  33  33  3 

6:333  333 

0:333333 

4:333  333 

5:333333 

3:333333 


run_kernel();  II  Call  HostC  driver  code  that  will  invoke  Driver  compute  code 


MPI_Finalize(); 
return  0; 

> 


Host:  MPI/CUDA  compilation 

II  compilation  with  nvcc  and  mpicc 
$nvcc  -c  kernel.cu 

$  mpicc  -o  mpicuda  mpi.c  kernel.o  -Icudart  -L  /usr/local/cuda/lib  -l/usr/local/cuda/include 


Figure  19.  MPI  wrapper/driver  code,  compilation  and  output 

green  in  Fig.  18)  adds  them,  and  stores  them  in  a  devarray3  (locally  named  array 3  on  the 
device).  That  is,  there  is  no  sum  over  the  counter  index  in  kernel.cu.  CUD  A  automatically  sends 
each  index  of  arrayl  and  array2  to  as  many  of  the  (massive  number)  of  device  cores  on  the  GPU 
as  are  necessary.  On  each  core,  operations  are  perfonned  (here  a  simple  addition)  on  the  single 
indexed  item.  This  computational  methodolgy  is  in  keeping  with  the  spirit  of  massive  multi-core 
graphics  processing  for  which  the  GPUs  were  originally  designed.  After  execution  of  the  device 
compute  code  kernel.cu  the  final  cudaMemcpy  copies  the  contents  of  devarray3  back  into  the 
host  array  array3  this  time  using  the  directive  cudaMemcpvDeviceToHost.  The  memory  for  the 
device  arrays  is  then  released. 

So  far,  the  code  shown  in  Fig.  18  is  serial,  running  on  one  CPU.  In  Fig.  19  we  illustrate  how  this 
serial  code  can  be  embedded  in  an  MPI  wrapper  to  run  on  many  host  CPUs.  Most  of  the  MPI 
code  simply  initiates  parallel  communication  between  the  set  of  CPU,  by  creating  the  network 
called  MPI  COMM  WORLD,  and  giving  each  CPU  in  the  communication  network  a  rank, 
which  will  be  assigned  at  compilation  where  the  number  of  processor  (CPUs)  are  stated 
(ra«A={0,l,...NCPUs-l}).  The  only  non-trivial  code  portion  in  Fig.  19  is  the  command 
runjcernel  which  executes  the  serial  C  code  in  Fig.  18  separately  on  each  of  the  NCPU 
processors.  Since  there  is  no  parallel  communication  requested  in  Fig.  19,  the  MPI  code  simply 
acts  as  a  wrapper  for  the  C-code  in  Fig.  18,  and  the  output  shown  (in  the  upper  right  hand  of  the 
figure)  is  simply  replicated  on  each  processor  (the  rank  is  the  integer  to  the  left  of  the  and  the 
contents  of  array3  on  each  CPU  is  {3, 3, 3, 3, 3, 3}).  Also  shown  in  Fig.  19  is  the  joint  compilation 
of  the  CUDA/MPI  code  using  CUDA  compiler  nvcc  and  the  MPI  compiler  mpicc. 
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•  Curse  of  dimensionality: for  n  qubits,  quantum  state  \if/)=  2cx|*)  is  a  N=2"  dimensional  vector 


•  Unitary  operator  \y/' ’)  =  U\y/\  that  transforms  quantum  state 
is  (worst  case)  a  2"  x  2"  matrix 


Illustrative  Case 
n=3,  N=23  =8.  NP=3 


different  quantum  circuit  (U> 
decompositions 


seiial-like 

decomposition 


8-bit  quantum  cany-iipple  addei 


array_local [ i_local]  =  -array_local [ i_local]  +  2*average; 


parallel-like 
decomposition 
8-hit  quantum  carry-lookahead  adder 

•  Developed  hybrid  MPI/CU  DA 
Grover  Algorithm  on  Horus  cluster 

—  straight  forward  -  30  qubits 
(no  shared  memory) 

-8  CPUs 

—putting  pieces  together  for  more 
sophisticated  computation  (CONDOR) 


Figure  20.  Schematic  of  hybrid  MPI/CUDA  parallel  simulation. 

In  Fig.  20  we  illustrate  the  layout  of  the  distributed  vector  |^)  of  length  N=2"  across  Nprocrs 

CPUs  in  an  MPI  code.  As  in  Section  3.1  we  use  the  simple  example  of  n— 3  qubits  for  an  N=  8  bit 
array,  and  consider  Nprocrs  =3  CPUs.  Each  CPU  holds  floor{N /Nprocrs )  elements,  with  the 
remaining  R=N-  floor(N /Nprocrs)  <  N  elements  being  distributed  in  round  robin  fashion  to  the 


Host  MPI  GSAtest  code  portion  indicating 
which  portion  to  send  to  GPU  device 

II  Begin  Grover  iteration  loop 
for(it=  1 ;  it<=N_grover;  it++){ 

II  Step  1 :  tag  solution  with  minus  sign 
if(rank==rank_soln)  array[i_soln]  =  -array[i_soln]; 

II  compute  "average" 
avg  =  0.0; 

for(i_local=0;  ijocal  <  NJocal;  i_local++){ 
avg+=  array[i_local]; 

> 

avg  /=  N; 

II  reduce  this  across  all  procrs  with  MPI_Allreduce 
MPI_Allreduce(MPI_IN_PLACE,  &avg,1,  MPI_DOUBLE.  MPI_SUM,  MPI_COMM_WORLD) 


II  Step  2:  inversion  about  the  mean 


for(i_local=0;  ijocal  <  NJocal:  iJocal++){ 
array[iJocal]  =  -array[iJocal]  +  2‘avg; 


Send  this  local  Host  inversion-about-mean 
computation  to  the  GPU  Device  to  evaluate 
index-(i)-by-index  on  multiple  cores 


II  kernel.cu 
#include<stdio.h> 

_ global _ void  kernel(float  arrayl,  int  N,  float  avg) 

{ 

int  index  =  blockldx.x  ‘  blockDim.x  +  threadldx.x; 

if  (index  <N) 

{ 

arrayl  [index]  =  -arrayl  [index]  +  2  ‘avg; 

> 

> 


Figure  21.  Grover  Host  C  code  drive  and  GPU  device  compute  code. 

CPUs  with  rank  0  thru  R-\  (here  {3,3,2}  elements  to  the  three  CPUs).  The  question  now  arises  as 
to  what  computation  to  perform  on  the  host  CPUs  and  what  computation  warrants  the  cost  of 
memory  copy  between  the  host  and  device. 

The  code  in  the  left  hand  side  of  Fig.  2 1  is  written  purely  with  MPI  in  order  to  perform  the  Ujnv 
unitary  of  the  Grover  iterate  G=UinvUf.  This  involves  (i)  computing  the  average  avg  of  the 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 

24 


quantum  amplitudes  (the  entries  of  the  array )  and  then  (ii)  perfonning  the  inversion  about  the 
mean,  which  sends  array [i  local]  to  -array [i  local]  +  2*avg.  Since  the  full  array  |t//)  is 

distributed  across  all  CPUs  in  the  local  array  array [*] ,  the  computation  (i)  of  the  average  avg  is 
best  done  on  the  host  CPUs  using  the  MPI  distributed  communication  call  MPI_Allreduce.  Each 
CPU  sums  all  its  elements  locally  and  divides  the  result  by  the  global  value  N.  The  MPI  call 
MPI_Allreduce  with  the  argument  MPI_SUM,  sums  the  local  values  from  all  the  CPUs  and  then 
redistributes  the  result  to  each  processor. 

Computation  (ii)  which  uses  avg  computed  in  (i)  to  perfonn  the  actual  inversion  about  the  mean 
on  each  local  array  array [* ]  can  be  performed  index  by  index.  As  indicated  in  the  lower  right 
hand  corner  of  Fig.  21,  this  can  be  performed  on  the  GPU  compute  nodes.  For  each  CPU,  each 
array  element  array [i  local]  is  copied  to  a  core  on  a  GPU  where  the  update  array [i  local]  to  - 
array [i_local]  +  2*avg  is  computed,  and  then  returned  to  the  host  array [*] .  In  this  illustrative 
example  of  N=8,  the  latency  cost  of  copying  data  between  the  host  CPU  and  the  compute  device 
on  the  GPU  does  warrant  the  small  amount  of  computation  performed  on  the  device  core. 
However,  for  a  general  GSA  calculation  with  n  qubits,  the  array  array [* ]  holds  on  the  order  of 
2 "/Nprocrs  elements  (where  Nprocrs  is  the  number  of  CPUs),  and  the  calculation  does  indeed 
have  high  arithmetic  intensity,  enough  to  warrant  the  memory  copy  latency. 

In  the  numerical  studies  we  performed,  we  estimated  that  our  local  cluster  would  be  able  to  reach 
n=30  qubits,  with  N= 230  ~  109  elements.  The  limiting  factor  is  memory  since  the  addition  of 
every  qubit  doubles  memory  storage  requirement,  so  that  the  addition  of  10  more  qubits 
increases  the  memory  requirement  by  a  factor  of  roughly  103.  It  should  be  noted  that  the 

following  represents  only  the  storage  of  the  state  vector  stored  in  1 -dimensional  arrays 

distributed  across  all  the  CPUs.  For  a  quantum  computation  involving  general  unitary  operations 
U  of  size  2”x2”  the  memory  requirement  are  quadratically  increased,  thereby  lowering  the 
effective  number  of  qubits  that  can  be  simulated  in  the  quantum  computation. 

For  general  quantum  computations  there  is  another,  more  subtle  issue  to  address.  As  indicated  in 
the  right  hand  side  of  Fig.  20,  the  specific  decomposition  of  a  general  n-qubit  quantum  circuit  (a 
unitary  designed  to  carry  out  a  specific  task)  into  smaller  1 -qubit,  2-qubit  and  3-qubit  operations 
(which  are  much  easier  to  implement  physically)  has  a  large  effect  as  to  whether  or  not  the 
circuit  is  amenable  to  parallel  simulation.  Each  horizontal  line  represents  the  time  evolution  of  a 
qubit.  The  vertical  lines  with  k  dots  indicate  a  sub-unitary  operation  on  k  qubits.  Each  circuit 
represents  an  8 -bit  add  operation  (requiring  additional  ancilla  qubits  for  intennediate 
computations).  The  top  diagram  represents  a  decomposition  of  the  circuit  as  an  8-bit  carry-ripple 
adder.  Its  V-like  horizontal  decomposition  with  only  at  most  two  sub-unitary  operations  per 
vertical  slice  forces  a  serial-like  execution.  The  bottom  diagram  represents  a  decomposition  of 
the  circuit  as  an  8-bit  carry-lookahead  adder.  Here,  each  vertical  slice  contains  many  separate 
sub-unitary  operations  on  different  collection  of  qubits.  This  latter  decomposition  is  much  more 
amenable  to  parallel  execution.  Currently,  there  is  great  research  interest  in  the  most  efficient 
decomposition  of  a  general  n- qubit  unitary  into  the  fewest  number  of  one,  two  or  three  qubit  sub¬ 
unitary  operations.  However,  the  most  efficient  decomposition  may  not  necessarily  be  the  one 
most  amenable  to  parallel  implementation. 
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4.2  Grover’s  quantum  search  algorithm:  theory 
General  amplitude  amplification 


It  is  well  known  [Boyer96,  Kaye07]  that  the  Grover  iteration  (2)  can  be  extended  to  the  more 
general  case 

G=UvJJf=AU^Uf,  (12) 


where  A  is  the  operator  that  takes  the  standard  initial  state  1 0}  on  n  qubits  to  an  initial  “guess” 
state,  often  taken  to  be  the  equal  amplitude,  unbiased  state  <//)  =  l/ 4n  ^  |  x) , 


T 

’i/Vn' 

y  \ 

(  ^ 

f  \ 

0 

= 

i /Jn 

=  \v)  =>  A  = 

k) 

0 

i/Vw 

a  y 

V  / 

\  y 

(13) 


From  the  left  hand  side  of  (13),  we  see  that  the  initial  state  1 0)  picks  out  the  first  column  of  A, 
which  is  \y/) .  The  rest  of  the  columns  of  A  can  be  chosen  arbitrarily,  subject  only  the  restriction 
that  A  is  unitary  AA'=f^,  requiring  that  all  columns  (and  all  rows)  are  mutually  orthonormal. 
Note  that  in  standard  Grover  iteration  (2)  A  =  H®" .  Further,  A 1 0}  =  |  y/)  ensures  that 
U  ±  =  AU0±Af  =  2\y/)(y/\-lN  is  again  the  inversion  about  the  mean  operator  (8). 


The  quantum  database  state  and  the  subspace  phase  kickback  operation 

We  developed  a  quantum  database  state  \y/db) ,  this  time  in  the  quantum  circuit  model  approach  in 
which  we  explicitly  state  the  unitary  evolution  operators  (vs  the  AdQC  approach,  in  which  the 
focus  is  on  the  constructed  Hamiltonians).  To  describe  our  approach  we  again  utilize  the  example 
of  a  telephone  directory  database.  We  will  encode  both  the  names  and  the  telephone  numbers  into 
quantum  states,  and  illustrate  our  implementation  explicitly  with  the  example  utilizing  n= 2 
qubits  for  the  names  and  n= 2  qubits  for  the  telephone  numbers,  while  concurrently  developing 
formulas  for  an  arbitrary  number  n  of  qubits.  We  consider  the  case  of  N=  2"  (name,  telephone 
number)  pairs  {jc,. ,f,.}  .  The  quantum  database  state  \y/db)  is  given  by 

k<®)=^=ZI  (14) 


which  is,  in  general,  an  entangled  state  between  the  name  and  telephone  component  states. 

Note  that  |^rf4)is  an  /V-dimcnsional  vector  in  an  N 2  dimensional  Hilbert  space,  where  the  most 


general  state  is  given  by 


1 

Viv7 


N- 1  N- 1 


n2-\ 


Y^bk\k)xt^Xx®Mt, 


(15) 


where  M  and  M,  are  the  A-dimensional  Hilbert  spaces  of  the  names  and  telephone  numbers, 
respectively.  In  (14)  and  (15)  we  use  a  subscript  notation  to  denote  which  Hilbert  space  the  ket 
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belongs  |i)  e  JJ,  \  j)  e  and  \k)x  ^Mx®Mt.  Let  us  consider  the  specific  example  of  n= 2,  iV=2" 
=4,  and  utilize  the  decimal  representation  of  the  states  (i.e.  {*,d,},E[0123])-  Consider  a  telephone 
directory  and  corresponding  database  state  given  by 


names  telephone  #s 
0  2 


{Xi  ’  f  }/e[0 ,1 ,2,3]  ^ 


l^)-4(l0)J2),+l1)J3)(+l2)J0),+l3}J1U 


(16) 


=~nf(\2)« +I7L +I8L +I13)J> where  l*<>,  k),  H  Nx'+t<)„  ■ 


Note,  that  while  we  have  ordered  the  names  in  (16)  sequentially,  the  two  lists  of  names  and 

telephone  numbers  can  in  general  be  chosejTas  random  pennutations  of  the  integers  [0,1, _ ,Af-l]. 

Our  rationale  for  constructing  the  database  state  y/db )  i s  simple.  Given  the  telephone  directory 
(the  database)  we,  as  the  eventual  searcher  (database  interrogator),  can  encode  this  classical 

infonnation  into  the  quantum  state  \y/llh)  and  store  it  for  subsequent  interrogation.  Suppose  at  a 
•  •  •  • 
later  time,  we  select  (or  are  provided  with)  a  random  telephone  number  t  ,  and  desire  to  find  the 

associated  corresponding  name  x  .  We  can  then  construct  the  phase  kickback,  telephone  number 

tagging  operator  U'/  utilizing  the  known  information  of  the  selected  telephone  number  t*.  For 

example,  if  t*= 2,  the  operator  uy  would  have  the  form  given  in  Fig.  3  (with  x  now  replaced  by 

:|e  , 

t  ).  Note  that  U/t  acts  on  the  (v-subspace  (indicated  by  the  superscript)  of  telephone  numbers  t 


and  the  auxiliary  qubit  y,  and  not  on  the  x-subspacc  of  names,  on  which  we  are  seeking  the 
associated  name  x  .  Thus,  in  the  full  IN1  -dimensional  Hilbert  space  of  M{N)  ® 1 1  (where 


the  superscript  denotes  the  dimension  of  the  Hilbert  space)  the  telephone  number  tagging 
operator  has  the  following  form,  and  operational  PK  effect 


Uxy=rN®Uy  where  ft.{f)  =  8.= 


rN®u‘i\x)x®\t)i  ® 


•Ti 


1  t  =  t 
0  t*t 

/.« )  I 


=  \x)  ®1(-1)  '  \t)  ® 


=<-D*wl 


s  (17) 


Note,  that  G)'  performs  an  effective  sign  flip  on  states  |x)  ®|?*^for  all  values  of  x.  Due  to  the 
tensor  product  nature  of  the  component  states,  a  PK  sign  flip  on  produces  an  effective  PK 
sign  flip  on  which  includes  the  sought  after  state  |x*^  We  next  describe  the 

construction  of  the  database  state  and  the  Grover  operator. 


Encoding  the  database  into  the  quantum  database  state 

From  (13)  we  need  to  construct  a  unitary  operator  A  =  A  xt  such  that  A 1 0}^  ®  1 0)(  =  A  \  0)  =  |  y/db  )jt . 

^I°),®I°),^I°L=I^L-  (is) 
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This  is  most  easily  accomplished  if  we  perform  a  relabeling  of  the  indices  of  the  xt  component 
states  in  the  lower,  rightmost  line  of  (16)  so  that  we  bring  them  to  the  first  N  entries  of  the  N2 
database  vector,  i.e.  |^)  =  i/V4(|2>Tf  +  (7)^  +  |8},(  +  |l3)J->  ^(\0%  +  |i'}a  +\2')xt  +  |3')J - \¥'db) , 

which  we  will  call  the  prime  frame,  which  we  denote  by  primes  on  the  component  values.  In  the 
prime  frame,  the  database  state  has  the  fonn  \y/'ilb)  =  l/V4 [1,1, 1,1,0,..., 0]r and  we  seek  a  unitary 
operator  a'  with  the  property  that  =  )//),,)  .  Though  there  is  much  freedom  in  choosing  such 

an  a'  ,  the  simplest,  most  direct  (though  non-unique)  choice  that  we  adopt  here,  is  to  choose 
A'  =  HN  =  H&n ,  the  /7-fold  tensor  product  of  2x2  single  quibt  Hadamard  unitaries.  In  the  prime 

N 


wr 

;A/y 

0 


.V 


N-N 


l')|°')  =  |^> 


Figure  22.  Form  of  the  unitary  A'  operator,  effecting  the  operation  A'  |0')  =  |  yr'^)  ,  in  the 
prime  index  ordering. 


frame  A'  takes  the  block  diagonal  direct  sum  form, 


A' =  H„ 


n2-n  ’ 


(19) 


(see  Fig.  22),  in  which /v2jv  is  the  (N2 -N)  x  ( N 2 -N)  identity  matrix  acting  on  those  states  \i)x  ®  \j)t 
not  in  \fdb)xt . 


2  2 

We  can  transform  A'  back  to  the  original  unprimed  frame  by  a  series  o f  N  x  N  unitary  operations 
Si  j  =  Sfj  that  swaps  rows  i  and  j  of  any  matrix.  In  our  particular  example  (16)  where 


\v'a)  =  l/^{\2)x,+\7ll  +  \*)»  +  \l3)«)  we  have 

k*) = Jy(l2>« +I7L +I8L +I13L) s {*o,*i,*2.*3}={2.7,8>i3} 

Wdb) =^(I0L +I1L +l2  L+l3  L)  L> 


’A  ]""[  S;.k, A 


1  (A,/)  e  {(/,  /),(/,/)} 
0  otherwise 


which  is  illustrated  in  Fig.  23.  The  swap  operator  Sy  acting  on  a  quantum  state  vector  effectively 
perfonns  a  Pauli  bit- flip  operation  W  between  the  zth  and  /th  components,  and 
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Figure  23.  Successive  row  swapping  operations  to  transform  A'  in  the  prime  frame  to  A  in 
the  unprimed  frame  for  the  specific  telephone  database  example  in  (20). 

therefore,  acting  on  a  matrix,  Sy  swaps  the  zth  and  /th  rows.  Note  that  we  perform  the  row  swaps 
from  the  prime  to  the  unprimed  frame  beginning  with  the  largest  value  of/'  =  N ,  backwards  to 
smallest  value  /'  =  0 . 

Construction  of  the  Grover  iteration 

The  construction  of  the  Grover  iteration  is  most  clearly  described  in  the  prime  frame  (note  that  in 
the  numerical  examples  discussed  below,  the  simulations  are  carried  out  in  the  unprimed  frame) 
where  it  takes  the  form 


G'x>  =  U'x[  U'xt, , 

V  f 

=  AmU’*  A'^U,X!, 


(21) 


=(2I^L»KI-40(/-®c/>)- 


Equation  (21)  is  illustrated  in  Fig.  24  in  the  prime  frame  (dropping  primes  in  the  figure  for  visual 
clarity)  acting  on  \y'db)xt . 
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Figure  24.  Illustration  of  the  action  of  the  Grover  iteration  (21)  in  the  primed  frame. 
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Note  that  in  each  NxN  block  (/''.performs  a  sign  flip  U".  |*)(  =(-l)/  (,) |()( conditioned  on  the 

selected  telephone  number  t*,  independent  of  the  value  of  x.  The  operator 
U'^'  =  2 1  ///',,  )i(  ri  {y/’db  |  -  /y  (where  we  have  suppressed  the  subscripts  in  the  figure)  perfonns  an 

inversion  about  the  mean  on  the  N  components  of  \y/’db)  . 


The  net  effect  of  (21)  is  that  on  the  first  N  components  of  the  primed  database  state  \y/’db)  ,  we 
affect  a  Grover  iteration  of  the  original  form  in  (2).  On  the  later  N2  -N  components  of  \y/'dh)  we 


perform  the  operation  x*x 


t  )  i->  -  x  /  x 


t 


in  each  NxN  block,  followed  by  the 


multiplication  by  the  NxN  identity  matrix  fy.  However,  since  the  latter  N1  -N  components  of 
y/'db  }vi  are  initially  zero  (which  we  will  generalize  below),  they  remain  zero  after  the  Grover 

iteration.  Thus,  after  k  / 4j Grover  iterations,  the  amplitude  of  the  state  x ®|(*^  lying 

somewhere  in  the  first  N  (of  the  N2)  components  of  | ///'„,}  ,  will  be  driven  to  a  magnitude 
jN-l/y[N  ~  0(1)  with  probability  0(1-1/ N)  for  detection  upon  measurement. 


Numerical  simulations  of  algorithm 

In  Fig.  25  we  show  a  simulation  for  the  case  of  n= 3  qubits  where  a  pair  of  arrays  of  names  and 
telephones  of  size  N=  2”=8  are  chosen  as  random  pennutations  of  [0, 1 ,. . . ,/V- 1  ].  In  the  code,  the 

database  state  | y/db)a  =  l/V/V X*=oV)-« 's  constructed  in  the  unprimed  frame,  and  from  the  specific 

collection  of  LV  indices  {k}db  in  the  database  state,  we  construct^  in  (20)  from  A' ,  as  discussed  in 
(19)  and  illustrated  in  Fig.  23.  This  allows  us  to  construct u*L .  We  next  generate  a  random 

telephone  number  t  and  use  it  to  construct  the  specific  PK  telephone  tagging  operator  Uxt. . 
Subsequently,  we  assemble  the  Grover  iteration  Gxl  =  U "±  Ux\  and  apply  it  for  \ji  Vw/4j  =  2  .  In 

Fig.  25  note  that  only  N=%  of  the  total  ;V'=64  probabilities  are  non-zero  throughout  the  whole 
evolution,  corresponding  to  the  N=  8  non-zero  amplitudes  of  the  database  state  \vdb)xt . 
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Figure  25.  Numerical  simulation  of  Grover  iterations  (21)  for  n= 3  qubits  (/V=21=8)  with  a 
randomly  selected  telephone  number  t- 2  with  the  initial  database  state. 

Fig.  25  shows  a  numerical  simulation  of  Grover  iterations  (21)  for  n= 3  qubits  (N=2  =8)  with  a 

•  •  • 

randomly  selected  telephone  number  t  =2.  The  plots  (left  to  right)  show  the  probabilities 
(amplitude  squared)  for  the  {0,l,^Vtv/4j  =  2}  iterations.  The  abscissa  is  the  combined  xt  index 
k=Nx+t  ranging  from  0  to  N2 -1=63.  The  Grover  iterations  act  on  ///,,,)  and  drive  it  towards  the 
state  |  k‘  =10^  =  |x*  =1^  ®|  t  =2^  with  near  unit  probability.  Note  that  only  N=  8  of  the  total  yV2=64 

probabilities  are  non-zero  throughout  the  whole  evolution,  corresponding  to  the  /V=8  non-zero 
amplitudes  of  the  database  state  \y/db)xl . 
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Consideration  of  the  initial  state 

It  is  illuminating  to  consider  the  action  of  our  Grover  iteration  Gxt  on  initial  states  y/  „„}  other 
than  the  constructed  database  state  |  ydh ) t .  In  general,  the  normalized  initial  state  could  be  written 
in  the  form 

I  Y,mt  )xI=4p  |  Ydb  )xt  +  VKp  |  Y „db  )x, ,  (22) 

where  \v„db)xl  denotes  a  normalized  non-database  state,  i.e.  the  state  formed  by  all  components 
not  in  the  database  state  \vdh)xt  ■  In  (22),  p  =  \{y/dh\ YinU)\  is  the  probability  to  find  the  initial  state  in 
the  database  state.  After  the  |  /4J  of  Gxt  the  final  state  is  approximately 

|  Y  final  )x,~4p  I  *  )x  \ f  ),  +  Vt-P  I  Yndb  )  xt  ,  (23) 

which  implies  there  is  only  a  probability  p  to  detect  the  sought  after  solution  state  |/|)  ^ .  Thus, 

as  long  as  p>l/2,  the  form  of  the  GSA  presented  here  does  better  than  its  classical  0(N/ 2) 
exhaustive  search. 

The  initial  state  (22)  might  occur  as  an  imperfect  attempt  to  construct  the  desired  database  state 
\y4t)xl .  A  simpler  state  to  form  is  \yn*)  t  = '/  ®  H?"  \  0)t  ®  1 0); ,  the  A2  equal 
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Figure  26.  Numerical  simulation  of  Grover  iteration  (21)  for  n= 3  qubits  (/V=2  ,=8)  with  a 
randomly  selected  telephone  number  t=  6  with  the  unbiased  N 2  initial  state. 


amplitude  state,  since  the  last  equality  shows  that  we  can  form  this  state  directly  by  the 
application  of  2/; -fold  tensor  product  of  Hadamards  H®"  <8>H®"  acting  on  the  tensor  product  of  the 
//-qubit  standard  name-state  1 0}  and  the  //-qubit  standard  telephone-state  |0)  However,  from 
(22)  and  (23)  such  an  initial  state  renders  the  GSA  worse  than  classical  exhaustive  search  since 
there  are  N 2 -N  non-database  states  each  with  probability  l/ Va^  that  are  unchanged  by  the  Grover 
iteration  Gxt  for  a  total  probability  for  the  detection  of  y/ndh  )xi  upon  measurement  given  by  1  - 1  IN. 

The  initial  probability  of  p=l/N  to  find  \yNl  ^  in  the  state |(////,)  ;  remains  the  final  probability  to 

find  '/%■„„,) ;  in  the  solution  state  x "'j  |f^.  Figure  26  illustrates  though  that  Gxt  does  act  only  on 

the  /V-component  state  \wM)xl  buried  within  N 2  sized  initial  state  i//v,  j  . 


Fig.  26  shows  the  numerical  simulation  of  Grover  iteration  (21)  for  n= 3  qubits  (N=  2  =8)  with  a 
randomly  selected  telephone  number  t=  6  with  initial  state  ///,„„)  =  |///v,,^  ,  =  V1').,  •  The 

plots  (left  to  right)  show  the  probabilities  (amplitude  squared)  for  the  1  ’  ’L  -I  "iterations. 
The  abscissa  is  the  combined  xt  index  k=Nx+t  ranging  from  0  to  N2-l=63.  The  Grover  iterations 


act  upon  the 


M, 


towards  the  state 1 


portion  of 

\k*  =  30\  =  lx*  =  3 


where 


P=\/4n^  =  l/N 


,  and  drives  it 


\t  =6 


with  probability  p=l/8.  Note  that  all  N2=64 


probabilities  are  non-zero  throughout  the  whole  evolution,  but  only  the  N=8  amplitudes  of  the 


database  state 


V'dt /.«  are  acted  upon  by  Gxt  (compare  with  Fig.  17).  Because  the  amplitudes  in 

Winn)  =  \VNX 


are  unchanged  by  Gxt  using  the  initial  state, 


'yields  inferior  performance  when 


compared  to  classical  exhaustive  search.  Eq.(22)  and  Eq.(23)  argue  that  one  should  use  p  as  close 


to  1  as  possible,  i.e.  the  initial  state  should  be  as  close  to 


as  is  physical  realizable. 


4.3  CNOT  gate  in  PTR  glass:  simulation 

In  our  research  we  considered  both  multiplexed  and  stacked  volume  holograms  configurations 
for  constructing  quantum  gates.  In  [Miller lib]  we  detailed  the  use  of  multiplexing  to  simulate 
quantum  teleportation.  One  alternative  to  multiplexing  is  to  make  single  recordings  in  each  of 
many  holograms,  and  then  stack  the  holograms.  In  this  report,  we  provide  here  our  design  of  a 
quantum  CNOT  gate  compatible  with  PTR  glass.  This  gate  is  realized  by  stacking  four 
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Figure  27.  Volume  holographic  design  of  4-dimensional  CNOT  gate  in  PTR  glass. 


holograms,  which  we  describe  below.  The  CNOT  gate  is  a  two  qubit  gate.  Therefore  the 
dimension  of  the  state  space  is  4-dimensional.  While  this  state  space  can  be  constructed  as  a 
product  space  of  qubits  by  utilizing  the  polarization  states  of  two  correlated  photons,  it  can  also 
be  represented  by  a  single  LM  photon  in  a  4-dimensional  state  space.  The  CNOT  gate  can  be 
constructed  with  a  single  photon.  Following  the  arguments  in  Section  3.3,  we  freely  choose  four 
independent  plane  waves  lying  on  the  cone  shown  in  Fig.  27  above. 

We  associate  to  these  independent  transverse  LM  modes  the  four  orthogonal  quantum  state  vectors  |  5) ) , 
|  S2 ) ,  S}  and  |  ^4 )  ■  Any  quantum  state  vector  (//'j  ,  in  this  4-dimensional  state  space  can  be  written 


as  a  linear  superposition  of  these  states, 

^}  =  a|S1}  +  /?|S2}+y|S3)  +  £|S4>,  |«r+K+H2+|^2=l. 

Each  of  our  basis  states  can  be  expressed  in  matrix  notation, 


rc 

f(F 

f(L 

f°l 

0 

1 

-  ls3}= 

0 

0 
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.  |*2>  = 
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1 

,  and  5;}  = 
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,0, 

v0, 
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(24) 


(25) 
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In  this  computational  basis  the  CNOT  gate  can  be  expressed  by  the  following  unitary 
transformation 


CNOT  = 


"1  0  0  0" 
0  10  0 
0  0  0  1 
v0  0  1  0y 


(26) 


If  we  let  the  z-axis  be  orthogonal  to  the  face  (x-y  plane)  of  the  hologram,  the  four  volume 
holographic  gratings  are  recorded  by  a  suitable  superposition  of  the  set  of  four  signal  plane 
waves, 


(f  1 5, )  =  exp (ik{  ■  r),  (f  |  S2)  =  exp (ik2  ■  r),  (r  \S})  =  exp (ik3  ■  r),  (r  \  S4 }  =  exp(i£4  ■  r),  (27) 

and  four  reference  waves 


(r  \R{)  =  expfik,  ■  r ),  (f\R2)  =  exp(iic2-r),  (r |i?3)  =  expfik3 -F),  (f \Ra)  =  exp(z'k4 -F),  (28) 


as  shown  in  Fig.  27. 

The  hologram  is  recorded  so  that  each  row  of  the  unitary  matrix  of  the  CNOT  gate  is  used  to 
generate  its  own  volume  holographic  grating.  For  a  2-qubit  gate  such  as  the  CNOT  gate  we 
would  ordinarily  require  four  recordings;  however,  since  the  first  two  bits  are  just  an  identity 
matrix  we  need  only  two  layers  to  transfonn  the  signal  states  into  the  desired  reference  states.  In 
addition  to  one  holographic  recording  per  dimension  of  the  state  space,  we  also  require  the 
conjugate  of  each  grating  (two  in  the  case  of  the  CNOT  gate)  in  order  to  transform  the  diffracted 
reference  waves  from  the  reference  waves  back  into  the  desired  signal  states.  In  particular,  the 
CNOT-gate  constructed  from  four  holographic  gratings  stacked  together  as  is  shown  in  Fig  27: 


1.  The  first  grating  is  recorded  with  the  two  coherent  plane  waves  corresponding  to  states 
|  Sj)  and  |  R4) . 

2.  The  second  grating  is  recorded  with  the  two  coherent  plane  waves  corresponding  to  states 
| S4)  and  1^) 

3.  The  second  grating  is  recorded  with  the  two  coherent  plane  waves  corresponding  to  states 
\R4)  and | Sj } 

4.  The  second  grating  is  recorded  with  the  two  coherent  plane  waves  corresponding  to  states 
| R3)  and | Sj). 

I  s  )  I  s  ) 

The  four  gates  will  not  diffract  the  first  two  signal  states  '  l'  or  '  2' .  However,  the  first  two 


gratings  redirect  the  two  signal  states 
with  the  Pauli  X-gate, 


respectively,  in  accordance 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 

35 


(29) 


The  first  hologram  is  equivalent  to  the  operator, 

t71=|51}(^|  +  |52)(52|  +  |^}(53|  +  |54)(54|,  (30) 

and  the  second  hologram  recorded  with  the  signal  plane  wave  (r|i?4^is  equivalent  to  the 
operator, 


t/2=|51}(51|  +  |52)(52|  +  |^4)(^4|  +  |^)(54|.  (31) 

While  these  two  recordings  could  have  been  made  in  a  single  multiplexed  hologram,  we  recover 
the  same  function  by  stacking  the  two  together,  thereby  generating  the  CNOT  operation, 


U'chot  =u2ir,  =|5i}(5i|  +  |52}<52|  +  |i?3}(^|  +  |i?4}(53 


R 


(32) 

|*2)  |*3) 


However,  the  output  of  these  two  stacked  holograms  are  the  reference  states1" 1 

I  4'  .  In  order  to  redirect  these  back  to  the  proper  signal  states,  we  require  the  redirection 
operator  similar  to  Eq.  (11).  This  can  be  accomplished  by  recording  a  third  hologram  with  the 


\R, 


IN 


states  1  3/  and  1  '  .  The  third  hologram  is  equivalent  to  the  operator, 


t/3=|51}(51|  +  |52)(52|  +  |53}(i?3|  +  |i?4}(i?4 


(33) 


Similarily,  the  fourth  hologram  is  recorded  with  the  states 
operator, 


and  is  equivalent  to  the 


U4  =  !$)<$  |  +  \S2){S2 1  +  |S3)<S3 1  +  |S4}(*4 1.  (34) 

Therefore,  the  combination  of  the  four  stacked  volume  holograms  has  the  desired  action  —  the 
CNOT  gate, 


UCNOr  =  (^3)^)  =  I  S1){S1 1  +  \s2)(s2 1  +  \S,)(S3 1  +  \s3)(s4\.  (35) 

One  can  apply  these  principles  to  design  a  universal  set  of  quantum  gates,  as  well  as  simple 
quantum  algorithms  such  as  QT. 

The  advantage  of  stacking  the  holograms  is  that  one  can  make  the  hologram  thicker,  thereby 
increasing  the  efficiency;  however,  achieving  and  maintaining  the  proper  alignment  should  be 
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more  problematic.  By  multiplexing,  we  would  need  two  holograms,  each  with  two  independent 
recordings  in  them.  The  first  would  be  equivalent  to  the  last  two  holograms  in  Fig.  27,  while  the 
second  would  be  equivalent  to  the  first  two  and  would  just  redirect  the  reference  beams  into  their 
corresponding  signal  states.  The  first  two  recordings  are  complementary  to  the  second  two  —  thus 
in  some  sense  we  are  recording  the  "square  root"  of  the  CNOT  gate. 

4.4  Entangled  Bell  state  evolution  with  topological  protection:  simulation 

In  this  section  we  discuss  the  simulation  results  of  the  creation  of  an  entangled  Bell  state  in  a  2D 
cluster  state  (CS)  and  its  evolution  under  random  depolarized  noise  errors  while  undergoing  error 
correction.  The  end  result  is  that  cluster  states  offer  a  threshold  for  Bell  state  creation/evolution 
that  increases  with  the  2D  lattice  size  in  which  it  is  encoded.  The  error  threshold  rates  found, 
0.052  for  a  lattice  of  edge  length  /=13  and  0.083  for  /= 29,  are  significantly  higher  than  the  severe 
10'4  (at  best  1 0")  single  qubit  error  threshold  rates  encountered  in  the  usual  quantum  circuit 
model.  This  is  an  indication  of  the  topological  protection  resulting  from  the  use  of  cluster  states 
and  measurement  based  quantum  computation.  The  details  of  the  Bell  state  creation  and  error 
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Figure  28.  Simple  cluster  states,  their  ID  wavefuntion  representations  and  stabilizer  generators. 

correction  in  a  2D  and  3D  are  quite  involved.  For  clarity  we  illustrate  the  concepts  in  the  more 
simple  ID  cluster  state  lattices.  It  should  be  recalled  that  ID  clusters  state  are  not  universal  for 
quantum  computation,  hence  our  studies  took  place  on  2D  and  3D  lattices. 

Consider  the  simplest  CS,  the  ID  chain  in  Fig.  28a  consisting  of  vertices  i= 1  and  j= 2  connected 
by  an  edge.  The  preparation  of  the  CS  is  as  follows:  each  vertex  (black  dot)  represents  a  qubit-z 

prepared  in  the  state  |  +) .  =  (|  0) .  + 1 1) . )  j  V2  ,  the  + 1  eigenstate  of  the  operator  Xt.  The  edge  (black 
line)  represents  the  Control-Z  operation  CZ'  acting  between  the  qubits  z  and  j.  The  operator  CZ' 
is  a  diagonal  matrix  with  entries  {1,1,1,-!},  with  rows  and  columns  labeled  by  the  computational 
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basis  states  {|00)  ,|0l)  ,|10)  ,|ll)  },  i.e.  CZ\  |ll)  =-|ll)  ,  while  the  other  three  states  are 
unchanged.  The  cluster  |Ga)  state  in  Fig.  28a  then  given  by  |Ga)  =  CZ\  |  +  +^)2  = 

(|0+)l2  +I1_)l2)/^’  Which  iS  eaSUy  Verifled  Usin§  |  +  +)12  =  (|  00)l2  +  I01)l2  +  I10)l2  +  I1  1)l2  V2  ’ 

applying  CZ'.  1 1 1),,  =  -|  1 l)  and  regrouping  terms. 


An  alternative  method  called  the  stabilizer  formalism  (Gottesman97,  NielsenOO)  more  simply 
defines  this  state  than  the  above  procedure  of  writing  out  all  (in  general  2”  for  n  qubits) 

wavefunction  components.  Using  the  results  Xt  |±^.  =  Xt  (|0).  ±|l).)/V2  =  ±|±).,  Zi  |0).  =|0). 

and  Z,  1 1^  =  - 1 1^  it  is  easy  to  verify  that  the  state  |  Ga )  =  CZ\  |  +  +}12  is  the  (unique)  + 1  eigenstate 
of  the  operators  {X1Z2,  Z 1X2}  (first  row,  third  column  of  Fig.  28a).  These  operators  are  called  the 
stabilizer  generators  of  the  state  |  Ga  'j  in  that  all  products  of  these  generators  also  stabilize  Ga } , 
i.e.  return  a  +1  eigenvalue. 

Row  (b)  of  Fig.  28  depicts  a  4-qubit  CS  chain  |  Gh )  created  by  putting  each  qubit  in  the  state  |+). 
and  acting  with  CZ‘i+l  gates  between  adjacent  pairs.  Since  each  CZ  gate  is  diagonal,  they  all 
commute,  so  this  operation  can  be  done  in  parallel  -  a  significant  feature  of  OWQC  (or  CSQC). 
The  4-qubit  linear  chain  \Gb)  =  CZ\CZlCZ\  |  +  +  +  +)  is  stabilized  by  the  generators  (given  in 

the  third  column)  {X\ Z2,  Z1X2Z3,  Z2X3Z4,  Z3X4}.  In  general,  i.e.  not  just  in  ID,  the  CS  G'j  is 
stabilized  by  all  generators  of  the  form  S)  =X;.]~[  v  Z;  where  the  index  i  runs  over  all 

vertices  (qubits)  in  the  graph  G  and  NG(i)  represents  the  neighborhood  of  vertex  i,  i.e.  all  vertices 
connected  to  vertex  i  by  an  edge.  The  cluster  state  |G)is  then  the  unique  +1  eigenstate  of  all 
products  of  the  generators  {Sf. 


We  illustrate  the  construction  of  a  maximally  entangled  Bell  state  in  a  3-qubit  lD-chain  |  G^  = 

CZjCZj1 1  +  +  +)m  =  I  °>2  (1 0°>l3  + 1 00>o  + 1 00)]3  + 1 0°>l3  )/2  + 1 1)2  (|  00>ls  - 1 0°>l3  - 1 0°),  J + 1 0°>l3  )/2 

,  where  the  last  equality  follows  from  expanding  out  |  G,  'j  and  factoring  out  the  states  1 0^  and 
1 1)2  to  the  far  left.  Let  us  now  make  a  measurement  of  qubit-2  in  the  X-basis.  When  qubit-2 
returns  a  value  of  ±1,  the  state  |G3)is  projected  into  the  |±)  ,  which  we  denote  as 
|G3)h^|±),  ?(±|G3).  Projection  onto  the  +1  eigenstate  of  X2  returns  the  state 


+)2  (|  dd)i3  + 1 1 1)13)/ which  is  the  symmetric  maximally  entangled  Bell  state 
flo>13  =(|00}13+|ll)13)/V2on  qubits  1  and  3.  Projection  onto  the  -1  eigenstate  of  X2  returns 
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(| Ol)i3  +|^)i3)/^’  which  is  a  diiferent  maximally  entangled  Bell  state  on  qubits  1  and  3, 

that  can  be  converted  into  the  previous  one,  by  the  application  of  the  operator  X\  (the  application 
of  the  last  operator  X\  is  tenned  modulo  local  Pauli  corrections).  The  result  is  general  for  one¬ 
dimensional  CS  chains  of  //-qubits.  That  is,  the  measurement  of  qubits  i= 2  to  /=n- 1  in  theX-basis 

returns  the  symmetric  maximally  entangled  Bell  state  |/?00)ln  =  (|00)ln  + 1 1 1}^, ) / >  modulo 
local  Pauli  corrections. 

We  now  briefly  consider  the  issue  of  quantum  error  correction  (QEC)  in  terms  of  the  stabilizer 
fonnalism  (for  which  it  was  originally  intended,  see  Gottesman97).  Suppose  we  use  repetition 

coding  to  encode  single  qubit  states  into  logical  states  composed  of  3-qubits  |0^  h->  |0)  =  |000^ 

and  1 1)  l — >  1 1)^  =  |l  1 1)  .  A  general  logical  qubit  \v) L  =  « 1 0)^  +  1 1)^  =  a|000^  +  h|l  1 1^ is  stabilized 
by  generators  { Z\Z2 ,  Z2Z3}.  Suppose  errors  occur  as  single  bit-flips  X;.  Consider  for  example  the 
action  of  a  bit-flip  error  on  the  first  qubit  \v)L  I— >  Xx  | y/'j  =  a|l00|}  +  h|01 1^ .  A  measurement  of 

the  generators  on  corrupted  state  Xx  \  ys) L  now  yields  {ZxZ2,Z2Z2}Xx  \  y/\  =  {-l,+l}|^')i .  This  is 

called  a  syndrome  measurement.  Instead  of  the  generators  acting  on  the  state,  we  can  find  the 
result  of  the  syndrome  by  transforming  the  generators  by  the  general  unitary  error  operator  U  as 
U 1 Z, Z2,Z2Z3  \lj"  (note  that  X'  -  X,  Z'  -  Z  ).  Using  the  result  that  XZX=-Z  (ZXZ=-X)  and  that 
operators  for  different  qubits  commute,  we  have  X2{ZiZ2,  Z2Z3}X/  =  J  -Z 1 Z2,  Z2Z3},  which  again 
returns  {-1,+1}|^)  when  acting  on  the  corrupted  state  Xx\  In  a  similar  manner  we  find  that 

a  bit  flip  on  the  second  qubit  yields  the  syndrome  X2{Z\Z2,  Z2Z3}X2  ={-Z|Z2j  -Z2Z3},  while  a  bit 
flip  on  the  third  qubit  yields  the  syndrome  X3{Z\Z2,  Z2Z3 } A?  =|Z|Z2.  -Z2Z3}.  Of  course,  the 
absence  of  any  bit  flip  error  yields  I{Z\Z2_  Z2Z3}/  ={ZiZ2,  Z2Z3}.  Collecting  these  results,  we  see 
in  Fig.  29  that  the  syndrome  measurements  of  the  generators  of  logical  qubit  |  give  us 


ZiZ2 

Z2z3 

Error  Type 

Corrective  Action 

+1 

+1 

no  error 

no  action 

-1 

+1 

bit  1  flipped 

flip  bit  1  with  X\ 

-1 

-1 

bit  2  flipped 

flip  bit  2  with  X2 

+1 

-1 

bit  3  flipped 

flip  bit  3  with  X3 

Figure  29.  Error  correction  for  3-qubit  bit  flip  code  in  stabilizer  formalism. 

a  unique  signature  of  which  bit  was  flipped,  and  what  corrective  action  needs  to  be  perfonned. 
Although,  this  example  is  fairly  simple,  it  demonstrates  the  utility  of  tracking  single  qubit  errors 
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(the  most  common  error  model)  on  a  state  in  terms  of  the  measurements  of  the  stabilizer  of  the 
state.  This  is  particularly  important  in  the  general  case  of  /; -qubit  quantum  states  where  the 
number  of  stabilizers  (to  keep  track  of)  scales  polynomially  as  O (n~),  whereas  the  number 
quantum  amplitudes  scales  exponentially  as  0(2")  [NielsenOO,  Campbell09]. 

We  now  extend  these  concepts  to  a  3D  cluster  state  in  the  form  of  rectangular  grid  and  single  out 
the  first  spatial  direction  on  the  cluster  as  a  “simulated  time.”  We  consider  a  perpendicular  2D 
slice  of  this  cluster  into  which  we  encode  a  logical  symmetric  Bell  state  in  what  is  called  a 
surface  code  [Bravyi05].  The  qubits  in  the  cluster  state  are  subdivided  into  code  and  syndrome 
qubits.  Measurement  of  the  syndrome  qubits  in  the  X-  basis  projects  the  code  qubits  into  a 
surface  code  state.  In  a  3D  cluster  consisting  of  many  linked  2D  slices,  measurement  of  the  code 
qubits  results  in  teleportation  of  the  encoded  state  from  one  slice  to  the  next  (plus  local 
Hadamard  gates),  and  measurement  of  the  syndrome  qubits  amounts  to  measurement  of  the 
surface  code  stabilizer  [Briegel08]. 

SURFACE  CODE  ERROR  RATES 
L  -  Lattice  Size,  d  -  Code  Distance  »  <L*1F2 


Input  Error  Rale.  p„ 


Figure  30.  Error  rates  for  the  encode  Bell  state  in  3D  lattice  of  size  d\(2d+l)x(2d+l). 


The  simulation  results  are  illustrated  in  Fig.  30.  In  the  simulation,  we  have  assumed  a 
depolarized  noise  (DPN)  model  in  which  an  error  is  applied  with  probability  pm,  yielding  an 
output  error  with  probability  pout  for  the  state  to  be  prepared  in  the  Bell  state.  Fig.  30  illustrates 
the  relationship  between  the  input  error  rate  and  the  output  error  rate  of  a  Bell  state  creation 
using  surface  codes  for  varying  lattice  sizes.  The  results  are  obtained  from  a  Monte-Carlo 
simulation  that  uses  a  DPN  channel.  In  other  words,  the  probabilities  of  applying  a  non-ideal 
Pauli  error  are  all  the  same  and  equal  to  piJ3  ,  where  the  3  in  the  denominator  refers  to  X-,  Y  -, 
and  Z-errors.  As  is  the  case  with  surface  codes,  the  greater  the  lattice  dimension,  /  (denoted  as  L 
in  Fig.  30),  the  higher  the  distance  of  the  code,  d  (where  a  code  with  distance  at  least  d=2t+l  can 
correct  errors  on  t  bits).  Specifically,  the  code  distance  is  directly  related  to  the  lattice  size  by  d  = 
(/+l)/2.  Without  including  the  cost  of  overhead,  codes  with  higher  distances  are  always  more 
desirable  as  they  relate  directly  to  the  number  of  errors  that  a  code  can  tolerate  without  failure. 
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Figure  30  shows  the  fidelity  curves  for  logical  Bell  state  preparation  in  the  surface  codes  with 
dimensions  /  of  5,  13,  29,  and  61.  The  black  line  represents  the  fault-tolerant  threshold  line, 
where  the  input  error  rate  is  equal  to  the  output  error  rate.  Regions  of  the  curves  below  the 
threshold  line  indicate  regions  where  we  can  perform  quantum  computations  (in  this  case,  create 
logical  Bell  states)  with  asymptotically  arbitrary  small  error  using  concatenation  encoding  or 
larger  lattice  dimension.  In  the  context  of  surface  codes,  lower  output  error  rates  pout  are  achieved 
via  larger  lattices  as  opposed  to  concatenation.  Ideally,  the  fault  tolerant  threshold  pertains  to  the 
ideal,  infinite  size  lattice  for  which  the  output  error  pout  is  zero  for  input  error  pm  below  threshold, 
and  one  for  input  error  above  threshold.  As  particular  examples,  the  simulation  results  show  that 
the  threshold  for  the  Bell  pair  creation  is  about  0.052  for  the  code  with  lattice  size  /  =  13  (solid 
green  circle  in  Fig.  30)  and  0.083  for  the  surface  code  with  lattice  size  /  =  29  (solid  red  circle  in 
Fig.  30).  These  threshold  rates  of  approximately  5%  and  8%  are  typically  an  order  of  magnitude 
higher  than  the  most  promising  threshold  rates  obtained  using  the  standard  quantum  circuit 
model. 

4.5  Quantum  information  science  testbed 

Validation  of  the  testbed  was  a  twofold  process,  generate  entangled  photons  and  analyze  the 
quality  of  the  photons  that  were  produced.  The  fonner  comprised  of  exciting  the  crystal  with  a 
pump  beam  and  visualizing  the  output  photon  with  a  single  photon  CCD  camera.  The  latter  was 
accomplished  with  a  set  of  measurements  that  allowed  us  to  map  the  state  of  the  photon  system, 
called  quantum  state  tomography. 

Consider  the  generation  of  the  entangled  photons  via  SPDC.  In  the  first  row  of  Fig.  31  we  show 
the  results  of  sending  the  CW  laser  through  the  two-crystal  down-converter.  For  this  type-I 
SPDC  source  there  were  two  concentric  rings  of  orthogonal  polarization.  Both  rings  correspond 
to  photons  that  have  the  same  wavelength  of  810  mn  with  a  bandwidth  of  10  mn.  This  was 
selected  by  a  band-pass  filter  located  in  front  of  the  camera.  Other  filters  also  placed  in  front  of 
the  camera  stopped  the  part  of  the  pump  beam  that  did  not  created  down-converted  light.  Each  of 
the  rings  was  generated  by  one  of  the  crystals  of  the  two-crystal  stack.  The  diameter  of  each  ring 
(cone)  was  controlled  by  the  tilt  of  the  corresponding  crystal  that  generated  it.  The  tilt  axes  of  the 
crystals  were  orthogonal  to  each  other,  and  were  contained  in  a  plane  perpendicular  to  the  input 
beam  axis.  This  configuration  allowed  the  diameter  of  the  horizontally  polarized  ring  to  be 
controlled  by  tilt  of  the  crystal  about  a  horizontal  axis,  and  the  diameter  of  the  vertically 
polarized  ring  by  a  tilt  about  the  vertical  axis. 
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Figure  31:  Imaging  of  the  down-converted  light  for  three  different  configurations.  First 
row:  type-1  SPDC  as  a  function  of  the  tilt  of  one  crystal.  Second  row:  type-1  SPDC  rings  of 
different  diameters  as  a  function  of  the  polarization  of  the  pump  beam  (horizontal  on  the 
left  and  vertical  on  the  right).  Third  row:  type-11  SPDC  rings  as  a  function  of  the  tilt  of  the 
crystal.  All  cases  involve  the  cw  pump  laser  beam. 

In  the  first  row  of  Fig.  3 1  the  tilt  about  one  axis  was  varied  while  the  other  axis  remained  fixed. 
From  frame  to  frame  we  see  the  diameter  of  one  ring  decreasing  (from  left  to  right)  while  the 
other  ring  remained  at  a  fixed  diameter.  The  tilt  setting  for  generating  polarization-entangled 
photons  is  the  third  image  from  the  left,  a  case  where  both  rings  have  the  same  diameter.  The 
crucial  point  is  that  when  the  overlap  of  the  rings  is  perfect  it  is  impossible  to  know  from  which 
crystal  the  light  originated.  Photons  that  are  partners  of  each  other  appear  at  points  diametrically 
opposite  to  each  other  along  the  ring.  If  we  select  two  small  regions  opposite  each  other,  say  on  a 
horizontal  plane,  then  it  is  uncertain  whether  the  photon  partners  are  either  both  horizontally 
polarized,  or  both  vertically  polarized.  The  quantum  state  of  the  light  is  said  to  be  entangled  in 
polarization,  and  given  by 


(36) 


where  the  subscripts  denote  the  particle  labeling,  and  H  and  V  denote  the  polarization  labeling. 
The  equation  is  written  in  the  Dirac  formalism  of  quantum  mechanics.  The  variable  8  is  a  phase 
due  to  the  birefringence  of  the  crystals.  It  can  be  adjusted  by  tilting  a  wave  plate  located  before 
the  SPDC  crystals. 

For  the  case  of  type-I  SPDC  entangled  photon  generation  the  pump  beam  has  a  polarization  that 
is  orthogonal  to  that  of  the  down-converted  photons.  For  example,  vertically-polarized  photons 
are  produced  by  the  horizontal  component  of  the  polarization  of  the  pump  beam,  and  conversely, 
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Figure  32.  Diagram  of  the  setup  to  produce  and  diagnose  polarization-entangled  photons  via 
type-1  SPDC.  The  components  of  the  figure  are:  half-wave  plates  (HWP),  quarter-wave  plates 
(QWP)  polarizing  beam  splitter  (PBS),  Beta-barium-borate  SPDC  crystals  (BBO),  wave  plate 
(WP),  band-pass  filters  (F)  and  avalanche  photodiodes  (APD). 

horizontally-polarized  photons  are  produced  by  the  vertical  component  of  the  pump  beam.  For 
the  frames  shown  in  the  top  row  of  Fig.  31  the  pump  beam  had  equal  intensity  on  both 
polarization  components.  This  was  achieved  by  orienting  the  polarization  of  the  pump  beam  at  an 
angle  of  45°  with  respect  to  the  horizontal.  Since  the  polarization  of  the  laser  is  vertical,  we 
rotated  it  by  means  of  a  half-wave  plate  (HWP),  as  shown  in  Fig.  32.  A  tilted  wave  plate  (WP) 
was  used  to  adjust  8  The  figure  also  shows  the  two-crystal  system  (BBO)  and  illustrates  how  the 
camera  was  located  for  generating  the  images  of  Fig.  31.  For  now  let  us  ignore  the  other 
elements  in  the  path  of  the  light  after  the  crystal.  The  sequence  of  frames  in  the  second  row  of 
Fig.  31  corresponds  to  both  crystals  having  fixed  but  different  tilts  so  as  to  produce  rings  of 
different  diameter.  In  this  sequence  of  frames  we  changed  the  polarization  orientation  of  the 
pump  beam  from  horizontal  on  the  left  to  vertical  on  the  right.  We  saw  one  ring  on  the  right, 
which  is  consistent  with  down-converted  light  coming  from  only  one  crystal.  As  the  polarization 
of  the  pump  beam  was  rotated  the  second  ring  of  light  coming  from  the  second  crystal  appeared 
while  the  first  one  gradually  disappeared.  This  continued  until  the  last  frame  where  only  one  ring 
remained  owing  to  only  the  single  polarization  component  of  the  pump  beam  (vertical) 
producing  down-converted  light  from  only  one  crystal. 

The  third  row  of  Fig.  31  shows  the  images  for  entangled  photon  generation  via  type-II  SPDC 
with  a  single  crystal,  for  different  tilts  of  the  crystal.  In  this  case,  rings  of  differing  polarization 
were  non-collinear.  At  the  intersection  points  of  the  two  rings  the  photon  pair  is  entangled  in 
polarization.  The  quantum  state  of  the  photon  pairs  emerging  from  the  intersection  points  was 

t>=3^2^,F2>+ (3T) 

Adjustment  of  8  required  two  additional  compensating  crystals  placed  in  the  path  of  the  light. 
The  third  row  images  of  Fig.  3 1  depict  how  the  rings  changed  their  diameter  as  a  function  of  the 
tilt  of  the  crystal.  We  also  investigated  a  configuration  that  corresponded  to  the  collinear 
propagation  of  the  light.  This  corresponds  to  the  arrangement  of  the  tilt  of  the  crystal  that 
produced  the  output  shown  in  the  third  image  from  the  left,  where  the  two  rings  overlap  almost 
tangentially.  Adjustment  of  these  compensators  allows  for  optimization  of  the  states’  fidelity. 
The  full  analysis  of  the  quantum  state  can  be  measured  through  state  tomography. 
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Quantum  state  tomography  is  the  process  of  reconstructing  the  density  matrix  of  a  quantum 
system  from  experimental  data.  Through  a  series  of  measurements,  dependent  upon  the  degrees 
of  freedom  in  the  system,  one  can  construct  a  graphical  representation  of  the  quantum  system’s 
state.  For  the  purpose  of  this  discussion  polarization  is  the  only  degree  of  freedom  (of  several 
possible)  considered.  These  quantum  systems  are  inherently  described  by  a  linear  combination  of 
probability  amplitudes  and  eigenstates.  Thus,  reconstructing  the  density  matrix  provides  essential 
information  about  the  composition  and  quality  of  the  system  under  experimental  investigation. 

Although  quantum  state  tomography  is  well  defined  and  understood  in  both  theory  and 
experiment,  there  are  various  practical  hurdles  that  impede  its  fluid  implementation  in  a 
laboratory  environment.  A  matrix  of  4n  elements  (n  being  the  number  of  quantum  bits)  must  be 
populated  to  fully  characterize  a  quantum  optical  signal.  In  order  to  alleviate  the  challenge  of 
performing  4n  calculations  each  time  a  density  matrix  had  to  be  populated  with  experimental  data 
an  automated  protocol  was  developed  (in  MATLAB).  On  input  this  program  takes  in  the  4n 
values  of  predefined  measurements  [JamesOl]  and  outputs  the  various  calculations  that  are 
needed  to  analyze  the  integrity  of  a  quantum  system.  Values  for  fidelity,  coherence,  tangle,  and 
entanglement  of  formation  are  calculated  and  provided  in  conjunction  with  a  graphical 
representation  of  the  density  matrix  (Fig  33a, b).  This  protocol  allows  for  a  readily  available 
analytical  tool  that  is  fully  reconfigurable  and  scalable.  It  requires  little  computing  power  and 
may  be  executed  on  the  fly  in  the  laboratory. 

As  we  approach  the  limits  of  two  qubit  analysis  and  move  on  to  higher  order  state  spaces,  this 
tomographic  algorithm  allows  us  to  easily  tailor  the  calculations  to  the  nth  order  and  scale  the 
density  matrix  accordingly.  In  a  follow  on  to  this  in-house  project  we  will  investigate  the 
possibility  of  expanding  our  state  space  while  limiting  the  number  of  measurements  needed  to 
populate  its  density  matrix.  This  can  be  accomplished  through  the  use  of  “entanglement 
witnesses”  [Toth05].  This  approach  utilizes  an  algorithm  similar  to  our  tomographic  code  while 
being  able  to  populate  density  matrices  of  much  larger  n  without  measuring  all  possible  4n 
elements  individually.  In  Fig.  33a  and  Fig.  33b  we  show  a  graphical  representation  of  the  two 


(a)  (b) 


Figure  33:  Tomographic  reconstruction  of  density  matrix  from  experimental  data. 

quantum  states  (density  matrices)  given  in  Eq.  (36)  and  Eq.  (37),  respectively,  produced  by 
SPDC.  The  density  matrices  provide  a  description  of  the  quality  of  entanglement  between  the 
two  photons.  The  diagonal  elements  the  density  matrix  provide  probability  information  for  the 
system  to  be  in  a  given  computational  basis  state  ,\HV^ ,\VH^ ,\VV^} .  The  ofi-diagonal 
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elements  provide  information  about  the  coherence  of  the  system.  The  entropy  and  other 
entanglement  measures  may  also  be  calculated  from  these  matrices. 

4.6  Multi-crystal  lattices 

Physical  implementation  of  a  group  velocity  matched  (GVM)  crystal  in  the  800nm  regime  is  a 
non-trivial  task.  Materials  do  not  naturally  occur  with  this  property  in  this  particular  wavelength 
regime  as  they  do  at  the  region  of  1.5  pm.  The  design  illustration  in  Fig.  34  bears  some 
resemblance  to  ‘quasi-phase  matching’  (QPM),  but  there  are  important  distinctions.  In  QPM 
calibrated  periodic  poling  reverses  the  sign  of  the  nonlinear  coefficient  such  that  the  periodicity 
effect  can  compensate  for  the  phase  mismatch  in  a  medium  which  otherwise  could  not  exhibit 
‘non-criticaT  phase-matching.  Flere,  the  nonlinear  (NL)  B  Barium  borate  (B-BBO)  crystals  are 
already  phase  matched;  it  is  the  group  velocities  which  must  be  made  to  match  as  well.  This  can 
be  viewed  as  a  generalization  of  phase  matching  to  include  overlap  of  the  photon  propagation 

vectors  k.  =jf  , where  i,s,p  are  idler,  signal  and  pump  respectively.  The  phase  is  the  zero 

order  tenn  in  a  Taylor  expansion  of  the  propagation  vector  ( k .  s  ),  while  (inverse)  group  velocity 

is  the  first  order  term.  Unlike  QPM  however,  GVM  cannot  be  synthesized  in  a  single  medium; 
two  or  more  media  with  proper  complementary  properties  are  required  for  a  “compensated 
assembly”  [U’Ren06,  ErdmannOO].  Though  physical  difficulties  delayed  earlier  investigations, 
progress  in  crystal  fabrication  has  brought  the  feasibility  and  cost  within  a  reasonable  range  (Fig. 
34). 


Figure  34.  Type-II  custom  assembly  showing  alternating  BBO  (red)  and  calcite  (blue)  segments. 

The  orientation  of  the  crystal  phase  matching  function  (PMF)  is  determined  by  conservation  of 
momentum  for  the  propagation  components  along  the  respective  crystal  axes.  Note  that  the 
momentum  of  a  photon  in  a  medium  is  simply  k'  =  nk  where  the  index  of  refraction  n  embodies 
the  medium’s  effect  on  propagation.  The  width  or  spread  of  the  PMF  in  this  case  is  inversely 
related  to  the  crystal  length  and  is  orthogonal  to  that  of  the  pump  function  (which  embodies 
energy  conservation).  A  special  case  of  GVM  can  be  met  when  the  slope  of  the  crystal  function 
becomes  exactly  orthogonal  to  that  of  the  pump  function  and  the  widths  of  the  pump  and  crystal 
functions  are  engineered  to  be  equal  so  as  to  yield  a  separable  (factorizable)  state.  The  more 
general  conditions,  illustrated  in  Fig.  35,  involve  symmetry  about  a  vertical  (or  horizontal)  axis. 
Any  asymmetry  means  that  a  spectral  detection  of  one  photon  (e.g.  signal)  provides  spectral 
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Arbitrary  orientation  of  PMF 

When  GVM  can  be  achieved  special  applicationsarc 
enabled.  In  a  long  crystal  one  is  known  as  the  frequency 
correlated  state  |  F.C.S.>  ;  Photon  pair  members  detect  at 
identical  frequency,  though  each  photon  is  broadband. 

Figure  35.  Arbitrary  possible  orientations  of  the  crystal  function  with  varying  BBO-calcite 
thickness  ratios. 

information  regarding  the  second  (idler)  photon.  To  modify  the  orientation  (or  shape)  of  this 
distribution  one  could  add  optical  components  (e.g.  spectral  filters),  select  a  different  source,  or 
modify  the  effective  source. 

The  method  we  developed  in  this  work  made  use  of  custom  crystal  assemblies  (Fig.  34).  Each 
thin  (nonlinear)  BBO  segment  is  alternated  with  a  (linear)  medium,  which  is  also  birefringent. 
The  linear  medium  is  not  phase  matched  and  hence  does  not  generate  SPDC.  The  effect  of  the 
linear  medium  is  to  reverse  the  effect  of  pump  pulse  velocity  mismatch  in  BBO  compared  with 
that  of  the  SPDC  two-photon  wave-packet.  Calcite  has  been  identified  as  one  of  a  very  few 
crystals  with  the  requisite  properties  that  exhibit  this  effect  at  800  nm  (400  nm  pump  and 
shorter).  In  general,  for  sufficiently  thin  segments,  GVM  is  nearly  satisfied  throughout  the 
assembly,  and  deviations  from  the  ideal  case  can  be  calculated  from  the  actual  thickness  used. 
Our  initial  measurements  of  the  polarization  state  tomography,  illustrated  in  Fig.  36,  deviated 
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Figure  36.  Output  entangled  rings  and  tomography  for  initial  custom  assembly  under 
broadband  pumping. 

from  the  theoretical  expectation.  Further  measurements  on  these  prototype  assemblies  will  be 
required  to  resolve  this  in  future  work. 
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1  mm  type-ll  CW  pump 


Spectral  width  =  1.55  nm 


1  mm  BBO  type-ll 
Ti:Sapphire  100  fs  pump 


Spectral  width  =  9.29  nm 


Custom  Assembly  type-ll 
Ti  :Sa  p p h i rc  100  fs  p u  m p 


* 


Spectral  width  =  9.55  nm 
6.5  dB 


Loss  measurement  from  inserting  a  2  nm  spectral  bandpass  filter: 
1  mm  BBO  Spectral  width  calculated:  8.82  nm 
BBO-Calcite assembly  spectral  width  calculated:  10.47  nm 


Figure  37.  Spectral  Images  of  Type-II  polarization  entangled  photons.  Full  image  widths 
along  x-axes  shown  are  11,  22,  and  14  nm  respectively. 

Arbitrary  control  over  phase  matching  function  orientation 

We  focused  our  efforts  on  the  establishment  of  GVM  because  of  its  wide  utility,  but  the 
applicability  of  the  segmented  method  presented  above  can  be  also  extended  to  other 
applications.  In  particular,  this  approach  can  produce  arbitrary  orientations  of  the  PMF  (Fig.  37). 
In  this  case  we  may  view  the  GVM  condition  as  a  special  case  of  a  spectral  function  oriented  at 
45°.  Note  that  pump  orientation  is  always  approximately  45°.  The  PMF  orientation  can  be 
rotated  to  any  angle  by  simply  adjusting  the  ratio  of  segment  length  between  BBO  and  Calcite, 
something  which  cannot  be  accomplished  with  other  known  methods  for  which  special 
applications  have  already  been  identified  [U’Ren06,  ErdmannOO]. 

4.7  Schioedtei  entangled  photon  crystal  source 

The  multipli-entangled  photon  source  was  designed  and  developed  in  two  stages;  a  prototype 
constructed  in-house,  and  second  generation  (version  II)  built  by  an  outside  vendor.  The 
prototype  version  of  the  Schioedtei  assembly  was  constructed  from  two  8x8x2  mm  type-II  beta- 
Barium  borate  (fi-BBO)  crystals  phase  matched  (at  angles  of  theta  =  41.9°,  phi  =  30°)  for  810  nm 
SPDC.  Each  of  the  crystals  had  a  dualband  AR  coating  for  405/810  nm  on  all  faces  and  were 
placed  in  physical  contact  with  each  other  in  a  constructed  housing.  Version  II  of  the  assembly 
was  constructed  by  an  outside  vendor  since  optically  contacting  the  crystals  is  not  an  in-house 
capability  Version  II  was  dualband  AR  coated  for  405/810  nm  only  on  the  exterior  faces  of  the 
assembly. 

The  verification  and  analysis  testbed  required  for  testing  Schioedtei  is  shown  in  Fig.  38.  The 
experimental  configuration  required  for  testing  with  a  pulsed  pump  consisted  of  a  1 5  Watt 
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Figure  38.  Experimental  testbed  to  analyze  the  Schioedtei  source. 

CW  Vandate  laser  operating  at  532  nm  (Millenia  PRO  15sJ)  pumping  a  3.5  W  100  fs  TiiSapphire 
laser  operating  at  810  nm  (Tsunami  3960- 15HP),  passing  through  an  SHG  unit  (Inspire  Blue 
FM),  to  produce  —100  fs  pulses  at  405  nm  with  an  average  power  of  1.4  W.  The  405  nm  pulses 
served  as  the  input  excitation  beam  for  the  Schioedtei  assembly  after  first  passing  through  a  6 
mm  quartz  pre-compensator  and  a  half-wave  plate  set  to  22.5°  to  rotate  the  input  linear 
polarization  to  the  required  45°  for  equal  excitation  of  the  crystals.  This  configuration  also 
allowed  for  CW  mode  testing  in  which  a  100  mW  405  nm  diode  laser  was  inserted  into  the  setup 
via  a  flip  mirror  and  the  pre-compensator  was  then  removed  before  the  Schioedtei  assembly.  The 
residual  pump  beam  was  collected  in  a  beam  dump,  although  it  could  just  as  easily  been 
redirected  with  a  mirror  to  pump  further  crystal  stages.  The  cones  of  SPDC  generated  photon 
pairs  then  propagated  across  approximately  0.5  meters  of  free  space  to  obtain  the  useable  spatial 
separation  required  for  detector  access  to  the  middle  square  of  intersection  points  (5,  6,  7,  8). 
Inserted  into  each  of  the  twelve  free  space  paths  were  compensators  to  eliminate  the  temporal 
separation  between  the  signal  and  idler  photons  due  to  the  birefringence  of  the  Schioedtei 
assembly.  The  compensating  crystals  used  for  Schioedtei  were  8x8x1  mm  type-II  phase  matched 
fi-BBO  (at  angles  of  theta  =  41.9°  and  phi  =  30°)  aligned  orthogonally  to  their  respective 
counterparts  in  the  Schioedtei  crystal  pair.  These  compensators  could  not  be  used  for 
compensation  of  a  collinear  configuration  as  they  were  phase  matched  for  SPDC  at  810  nm  when 
exposed  to  a  405  nm  excitation  beam. 

Detection  of  the  generated  entangled  photons  was  accomplished  via  fiber-coupled  single  photon 
counting  avalanche  photodiodes  (APDs)  (Perkin  Elmer  SPCM-AQ4C).  Collection  apertures 
consist  of  fiber-coupled  collimators  and  spectral  distinguishability  of  the  photons  is  removed  by 
fiber-coupled  2  nm  bandpass  filters  centered  at  810  nm.  Coincidence  detection  was 
accomplished  by  connecting  the  four  detectors  to  a  coincidence  counting  module  (CCM) 
(Branning,  Trinity  College)  shown  in  Fig.  39  [Branningll].  This  board  allowed  for  up  to  four 
fold  coincidence  detection  via  four  input  channels  and  eight  reconfigurable  outputs  between  any 
of  the  four  input  channels. 
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Figure  39.  Coincidence  counting  module  (Branning,  Trinity  College  [Branningll])  utilized 
in  the  experimental  testbed  in  Fig  38. 

A  single  photon  cooled  CCD  camera  (Princeton  Instruments  Pixis  1024BR)  allowed  for  direct 
viewing  of  the  SPDC  photons  produced.  Utilizing  this  camera  greatly  facilitated  alignment  of 
the  output  of  the  Schioedtei  assembly  to  the  preconfigured  collection  apertures  of  the 
collimators.  An  alignment  grid  with  pre-determined  locations  for  spots  5, 6, 7, 8  was  used  to 
approximately  align  the  Schioedtei  assembly  to  the  existing  collimators. 

A  long  exposure  image  from  the  CCD  camera  is  shown  in  Fig.  40.  The  twelve  overlap  regions 
are  clearly  visible  and  the  spatial  symmetry  of  the  output  should  be  clearly  noted.  The 
orientation  of  the  crystal  assembly  gives  an  approximate  Gaussian  profile  on  spots  5, 6, 7, 8  and  a 
slightly  elongated  profile  for  spots  1,2,3,4,9,10,11,12.  The  central  bright  spot  shown  in  the 
middle  of  the  image  is  residual  810  nm  unfiltered  pump  beam  and  fluorescence  from  the  color 
glass  filter  used  to  block  the  CCD  from  the  405  nm  excitation  beam. 


Figure  40.  Experimental  data  from  in-house  constructed  crystal  stack. 

Once  the  Schioedtei  crystal  assembly  had  been  set  into  the  correct  orientation  via  the  CCD 
camera  a  630  nm  visible  laser  was  back  propagated  through  the  collimators  to  align  the  faces  to 
the  center  of  the  crystal,  as  shown  in  Fig.  41.  The  collimators  were  reconnected  and  final 
alignment  was  accomplished  by  optimizing  coincidence  count  rates  on  the  selected  channels  on 
the  CCM. 
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Figure  41.  Alignment  image  of  the  Schioedtei  crystal  stack. 

The  experimental  data  shown  in  Fig.  40  is  the  output  of  the  in-house  constructed  Schioedtei 
assembly.  The  image  shows  minor  levels  of  scattering  which  is  attributed  to  the  lack  of  optical 
contacting  between  crystal  1  and  2  in  the  assembly.  Scattering  is  also  due  to  imperfections  in  the 
crystal  faces  and  slight  angular  tilting  of  the  crystal  faces  with  respect  to  one  another  in  the 
custom  built  housing.  Overall,  the  output  SPDC  rings  are  well  defined  and  approximately  equal 
in  detected  intensity  on  the  CCD  camera.  Single  channel  count  rates  detected  by  the  Si-APDs 
averaged  -20000  counts/sec.  Coincidence  count  rates  observed  between  any  pair  of  spots 
(1,2, 3, 4, 5, 6, 7, 8)  were  -2000  counts/sec  with  4  fold  coincidence  count  rates  between  1,2, 3, 4  or 
5, 6, 7, 8  in  the  5-10  counts/sec  range.  Upon  alignment  and  optimization  of  each  of  these  channels 
a  2-photon  quantum  state  tomography  was  accomplished  on  any  of  the  diametric  pairs.  Since 
reconfiguration  of  the  collimators  was  required  to  observe  spot  sets  1,2, 3, 4  (linear  arrangement) 
or  5, 6, 7, 8  (square  arrangement)  diametric  pairs  were  chosen  within  each  of  these  sets.  Insertion 
of  quarter-wave,  half-wave  plates  and  polarizing  beamsplitters  (in  that  respective  order)  into  the 
free-space  section  following  the  compensator  and  preceding  the  collimators  was  required  for  full 
tomographic  analysis  of  the  produced  quantum  state.  The  resulting  density  matrix  can  be  seen  in 
Fig.  42.  The  resulting  quantum  state,  while  mixed  and  not  ideal,  was  a  promising  step  towards 

the  expected  state  of  \y/)=  (|HV}  +  | VH))/V2  (fidelity: F  =  (y/\pexp\y/)  =  0.65  ,  concurrence:  C  = 

0.53  where  C  =2 (aS-fly)  for  \y/)  =  a\HH'j  +  p\ HV)  +y\VH')  +  S\w)  . 
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Figure  42.  Experimental  tomography  data  (density  matrix)  from  in-house  constructed 
Schioedtei  crystal  stack. 
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To  improve  upon  the  prototype  design,  version  II  was  constructed  by  a  commercial  vendor  with 
the  capability  of  optically  contacting  the  crystals  in  the  assembly.  Optical  contacting  allowed  for 
the  removal  of  the  dual  band  AR  coating  layers  between  the  interfaces  of  crystals  1  and  2. 
Version  II  of  the  Schioedtei  assembly  was  recently  delivered,  though  not  yet  fully  characterized 
for  results  to  be  reported  in  this  report.  The  initial  images  from  the  generated  rings  are  shown  in 
Fig.  43.  SPDC  rings  produced  from  the  in-house  designed/commercially-constructed  Schioedtei 
assembly  showed  greater  uniformity  in  intensity  along  with  a  reduction  in  background  scatter. 


Figure  43.  Experimental  data  from  in-house  designed,  commercially-constructed  crystal  stack. 

The  Schioedtei  source  has  immediate  and  direct  implementations  for  the  generation  of  cluster 
states.  Cluster  states  play  a  central  role  in  the  measurement-based  one-way  quantum  computation 
approach  [RaussendorfOl].  In  this  scheme,  the  entanglement  resource  is  provided  in  advance 
through  an  initial,  highly  entangled  multi-particle  cluster  state,  and  is  consumed  during  the 
quantum  computation  by  means  of  single-particle  projective  measurements.  The  feedforward 
nature  of  the  one-way  computation  scheme  renders  the  quantum  computation  deterministic,  and 
removes  much  of  the  massive  overhead  that  arises  from  the  error  encoding  used  in  the  standard 
quantum  circuit  computation  model  [O’Brien07].  Fig.  44  illustrates  a  scheme  for  utilizing  the 
output  of  Schioedtei  to  generate  a  four  photon  cluster  state,  |  C4)  [Schmid07].  This  particular 

4 


I C.)  =  w  + 1 HHW\  w  + 1  WHH\  w  - \  VVVV),  ,  ,, ) 

Figure  44.  Experimental  setup  for  4-qubit  cluster  state  generation  utilizing  Schioedtei 
crystal  source. 
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example  employs  the  spots  1,2, 3, 4  and  requires  insertion  of  two  half-wave  plates  and  a 
controlled-phase  (CPhase)  gate.  This  scheme  could  be  expanded  to  include  the  other  eight  spots 
to  generate  even  larger  cluster  states.  Such  experiments  are  currently  being  explored  in-house  in 
a  follow  on  project. 

An  advantage  of  the  Schioedtei  configuration  is  the  diversity  of  states  that  it  is  capable  of 
generating.  Schioedtei  allows  for  the  direct  generation  of  the  (unnonnalized)  state 
HV)  ±  e""  |  VH)  as  well  as  the  generation  of  the  state  |  HH)±  e'("  |  VV)  with  the  addition  of  a  half¬ 
wave  plate.  In  addition,  separable  states  such  as  |HV)±e'>|VV)or  |HV)±e'p|HH)can  also  be 
directly  generated  with  clever  combinations  of  the  twelve  output  intersections  and  proper 
compensation. 

A  path  towards  increasing  the  useable  photon  count  rate  in  Schioedtei  is  the  integration  of  the 
GVM  phase  matching  constraint  (as  discussed  in  section  4.6,  see  [U’Ren06])  into  the  crystal 
construction.  A  GVM  configuration  is  possible  by  alternating  reduced  thickness  Schioedtei  and 
a-BBO  layers  (a-BBO  is  used  as  a  compensator;  there  is  no  second  order  nonlinear  effect  in  a- 
BBO  crystal  due  to  the  centric  symmetry  in  its  crystal  structure).  A  source  of  this  nature  would 
not  only  provide  six  spatially  separate  entangled  pairs,  but  also  alleviate  the  need  for  spectral 
filtering  of  the  photons.  An  increase  in  useable  signal  rates  of  10X  over  a  typical  type-II  source 
is  realizable  with  GVM  matching. 


5.0  CONCLUSIONS 

Grover’s  quantum  search  algorithm:  simulation 

Research  conducted  under  this  LRIR  indicates  that  the  hybrid  coarse  grain  (distributed  MPI)/fine 
grain  (multi-core  GPGPUs)  approach  to  numerical  simulation  of  quantum  algorithms  shows 
promise.  As  discussed  in  Section  4.1,  an  important  item  to  consider  for  the  applicability  of 
conventional  parallel  resources  is  the  particular  fonn  of  the  quantum  circuit  decomposition.  The 
point  here  is  that  the  most  efficient  decomposition  of  a  general  unitary  U  into  the  least  number  of 
one  and  two  qubit  operations  may  not  necessarily  be  the  one  most  amenable  to  the  utilization  of 
parallel  multi-processor  resources.  More  research  in  this  area  is  highly  warranted. 

In  general,  the  utilization  of  parallel  multi-processors  to  numerical  quantum  simulation  can  only 
increase  the  number  of  simulatable  qubits  by  a  finite  amount  (this  directly  addresses  the  power  of 
quantum  computation  over  the  conventional  parallel  computation),  as  the  following  argument 
illustrates.  The  number  of  qubits  nseriai  that  can  be  simulated  on  a  single  (serial)  processor  can  be 
estimated  as  log2  ( Nserial  -  2"°™' )  =  nserial .  Let  the  number  of  parallel  processors  be  given  as  a 

power  of  2  as  N  s  =  2"''™”  .  The  number  of  qubits  nparaiiei  that  can  be  simulated  by  utilizing 

Nprocrs  is  given  by  solving  nserial  =  log 2(N parallel / N procrs)  =  log2(2'Vo"/2'w),  with  result  nparaUei 

=  nSeriai  +  nprocessors ■  This  means  we  only  get  a  logarithmic  improvement  in  the  number  of  qubits 
that  can  be  simulated  as  we  increase  the  number  of  processors  (e.g.  1024  processors  only 
increases  the  number  of  simulatable  qubits  by  10).  However,  in  practice  it  would  be  very 
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advantageous  to  simulate  the  number  of  qubits  in  a  “small”  quantum  processor.  Using  1  byte  = 
8bits  ~  1  Obits  (for  order  of  magnitude  estimates)  we  see  that  on  a  serial  machine  50  qubits  would 
require  250  ~  (2IO)5~(103)5  ~1014  bytes  =  100  TB  of  memory.  However,  using  1024  =  210  parallel 
processors  would  require  only  100  GB  of  memory  per  processor,  which  though  large,  is  still 
within  reach  of  today’s  resources.  The  use  of  hybrid  coarse/fine  grain  simulation  quantum 
algorithms/circuits  using  MPI  and  CUD  A  may  be  of  help  in  this  endeavor. 

Grover’s  quantum  search  algorithm:  theory 

The  focus  of  the  work  explored  in  this  LRIR  was  to  illustrate  a  variant  of  Grover’s  algorithm  for 

N- 1 

the  case  of  a  k=  2  indexed  database  state  of  the  form|^} ^  =l/V^Xk-)*  ®|*«),  •  The  rationale 

i'=0 

behind  our  approach  was  to  avoid  the  requirement  that  the  oracle,  implementing  the  phase 
kickback  operation,  has  to  be  supplied  to  the  searcher  of  the  database  by  an  external  agent  (as  in 
the  original  formulation  of  the  GSA).  Instead,  our  variant  of  the  GSA  was  designed  so  that  the 
searcher  could  initially  encode  the  database  state  and  subsequently  search  it  at  a  later  time, 
without  having  to  know  the  sought-for  result  in  order  to  construct  the  phase  kickback  operator. 

The  variant  of  the  GSA  discussed  in  this  work  can  easily  be  extended  to  multi-indexed  databases 

N- 1 

states  of  the  form|^rfA)ws(  =  ®K),  ®h)r  •"  •  If  this  generalized  database  state  is 

i=0 

encoded  in  the  past,  then  a  later  time  a  chosen  subspace  component  (e.g.  the  t-subspace 
telephone  number  as  illustrated  in  this  paper)  can  be  searched  on  through  the  construction  of  a 
phase  kickback  operation  U'f  =  /*  ®  U'f  ®  !{,  ®  IrN  ■  ■  ■  on  that  subspace,  implementing  fit  )=1  and 

0  for  a  given  t,  producing  the  result  U'f  \  xt  )x  ®  1 t, ),  ®  |  s,  )s  ®  |  r  )r  =  |  x,  )x  ®  (- 1 1, )()  ®  |  s,  }s  ®  |  r  )r 
=  -|x,.)  ®|f,)  ®|,S()  ®|r,)  For  a  ^'-component  database  state  (i.e.  k  different  index  states 
\xi)x  >K),  >\si)s  >\ri)r  »•••)>  general  amplitude  amplification  as  given  in  (12)  G  =  U^Uf  =  AU(i,  A'b,j  can 

be  used  to  perform  an  0(4n)  Grover  search  algorithm  in  the  N  dimensional  subspace  of  a 
general  Nk  dimensional  Hilbert  space.  Again,  A  is  the  unitary  operator  that  takes  the  standard 
state  |o) mr  to  the  database  state  |^}mr  .  The  utility  of  this  approach  depends  upon  the  ease  and 

efficiency  of  constructing  the  operator^,  and  hence  the  quantum  database  state \y)  ■  Similar 

work  along  the  lines  of  a  quantum  Grover  search  upon  multi-index  states  has  been  considered  by 
Pang  et  al  [Pang06]. 

Quantum  computing  in  a  piece  of  glass  using  volume  holograms 

For  linear  optical  quantum  computing  the  overarching  advantage  of  constructing  simple  quantum 
gates  in  volume  holograms,  as  opposed  to  using  the  standard  free-space  optical  approach,  is 
stability.  Often  quantum  operators,  e.g.  the  simple  projection  operator  given  by  (11),  require  a 
cascade  of  interferometers  where  the  output  of  one  is  the  input  of  the  next.  Thus,  as  the 
dimension  of  each  state  space  increases  it  becomes  exceedingly  hard  to  stabilize  as  the  number 
of  qubits  increases.  Other  approaches,  such  as  crossed  thin  gratings,  lack  the  efficiency  needed 
for  QIP.  The  device  proposed  here  can  potentially  achieve  this  in  a  single  piece  of  glass  without 
the  problem  of  misalignment.  The  technology  presented  here  can  potentially  replace  "fixed" 
optical  components  on  a  broad  spectrum  of  classical  and  quantum  photonics  experiments. 
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The  primary  limitation  of  volume  holographic  QIP  is  that  it  is  not  scalable.  Experience  shows 
that  multiplexing  requires  approximately  1mm  per  recording  of  the  state  space  to  achieve  high 
fidelity,  and  in  QIP  applications  this  scales  exponentially  with  the  number  of  qubits.  Secondly, 
the  holograms  discussed  here  are  write-once  holograms  and  cannot  be  erased.  Therefore,  the 
algorithm  is  "fixed"  into  the  holographic  emulsion.  While  there  are  re-recordable  holographic 
media,  none  that  we  know  of  have  the  specifications  to  outperform  PTR  glass  for  the 
applications  discussed  in  this  manuscript.  Although  “static”  gates  are  not  the  preferable  media  for 
a  quantum  CPU,  this  technology  might  be  integral  to  complete  QIP  systems  where  smaller  d- 
partite  operations  are  needed  on  a  routine  basis,  e.g.  a  quantum  memory  bus,  quantum  error 
correction  circuit,  or  quantum  key  distribution  relay  system. 

While  we  have  extensively  analyzed  these  volume  holographic  quantum  gates  using  coupled¬ 
mode  theory,  paraxial  wave  equation  simulations  and  finite-difference  time  domain  simulations, 
we  have  not  fully  analyzed  the  engineering  particulars  of  this  device.  Many  important  practical 
questions  remain  to  be  explored,  for  example:  (1)  how  many  independent  writes  of  orthogonal 
states  into  a  holographic  emulsion  can  be  made  in  the  PTR  glass  before  cross-talk  between  the 
modes  becomes  a  limiting  factor?  (2)  Is  it  difficult  to  stack  the  holograms  due  to  the  enhanced 
angular  selectivity  of  the  volume  holograms?  And  (3)  what  is  the  maximum  number  of 
recordings  in  a  multiplexed  PTR  hologram  that  can  be  reasonably  achieved?  In  this  sense,  we 
are  well  along  in  understanding  these  devices  from  a  theoretical  prospective.  We  are,  however,  at 
the  very  beginning  experimentally. 

Cluster  state/one-way  quantum  computation 

The  largest  obstacle  to  physically  implementing  a  quantum  computer,  in  any  architecture,  is 
decoherence  -  the  unavoidable  environmental  degradation  of  quantum  interference  when  one 
interacts  with  the  quantum  computer  in  order  to  execute  operations  or  perform  measurements.  In 
realistic  physical  systems  decoherence  tends  to  make  quantum  systems  behave  more  classically, 
and  thereby  threatens  to  mitigate  any  computational  advantage  possessed  by  a  quantum 
computer.  However,  the  effects  of  decoherence  can  be  counteracted  by  quantum  error  correction 
[Shor96].  In  fact,  arbitrarily  large  quantum  computations  can  be  perfonned  with  arbitrary 
accuracy,  provided  the  error  level  of  the  elementary  components  of  the  quantum  computer  is 
below  a  certain  threshold.  This  extremely  important  and  relevant  result  has  been  named  the 
threshold  theorem  of  quantum  computation  [Aliferis06]  and  allows  for  the  possibility  of  fault 
tolerant  quantum  computation.  It  is  vitally  important  that  the  resources  involved  in  perfonning 
quantum  error  correction,  before  and  after  each  quantum  unitary  gate,  does  not  grow 
exponentially,  thus  again  threatening  to  mitigate  the  computational  advantage  of  quantum 
computation  over  conventional  computation.  Conventional  fault-tolerant  schemes  for  OWQC 
using  photons  have  recently  been  developed  [Dawson06,  Varnava06].  The  dominant  sources  of 
error  in  this  setting  are  photon  loss  and  gate  inaccuracies.  In  [Dawson06]  both  photon  loss  and 
gate  inaccuracies  were  taken  into  account  yielding  a  trade-off  curve  between  the  two  respective 
thresholds.  Fault-tolerant  optical  computation  is  possible  for  e.g.  a  gate  error  (probability)  rate  of 
10'4  and  photon  loss  rate  of  3xl0'\ 

The  OWQC  paradigm,  utilizing  cluster  states  as  the  initial  fundamental  entangled  resource, 
claims  to  substantially  reduce  the  resources  required  for  both  optical  quantum  gates  and  for  error 
correction.  Further,  by  encoding  a  collection  of  physical  qubits  within  the  2D  cluster  state, 
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OWQC  offers  a  means  of  topological  error  protection  for  the  logical  qubit.  The  simulations 
conducted  under  this  in-house  project  (Fig.  30)  show  that  the  threshold  for  an  entangled  Bell  pair 
creation  is  about  0.052  when  encoding  into  a  cluster  state  lattice  size  /  =  13,  and  0.083  for  the 
surface  code  with  lattice  size  /  =  29.  These  threshold  rates  of  approximately  5%  and  8%  are 
typically  an  order  of  magnitude  higher  than  the  most  promising  threshold  rates  obtained  using 
the  standard  quantum  circuit  model.  The  encouraging  results  of  this  portion  of  the  in-house 
research  was  used  as  motivating  factor  to  develop  a  follow-on  in-house  proposal  for  using 
photon-based  qubits  to  construct  quantum  gates  and  circuits  to  explore  the  OWQC  cluster  state 
approach  to  quantum  computation. 

Quantum  information  science  testbed 

The  focus  of  this  in-house  program  was  the  construction  of  a  6  qubit  capable  testbed.  The 
utilization  of  a  high  power  pulsed  laser  system  allowed  pump  powers  to  reach  a  regime  where 
conducting  multi-crystal/multi-stage  experiments  were  feasible.  Commonly  used  down- 
conversion  crystals  typically  produce  a  single  pair  of  entangled  photons  per  pass  of  the  pump 
laser.  This  limitation  typically  requires  multiple  crystals  and  high  power  to  be  used  to  increase 
the  number  of  qubits  over  the  standard  single  pair  level.  With  our  in-house  testbed  the  single  pair 
limitation  was  eliminated  with  the  high  power  of  our  pump  laser.  The  pulsed  pump  laser 
simultaneously  allowed  for  synchronization  of  the  generated  photons  from  multiple  crystals,  a 
requirement  when  generating  multi-qubit  photon  states.  With  the  addition  of  the  Schioedtei 
source  we  have  extended  the  capability  of  the  testbed  to  12  qubits  while  using  one  pass  of  the 
pump  laser  through  a  single  crystal,  as  opposed  to  a  single  pass  through  6  separate  crystals 
[Pan07], 

Temporally  compensated  crystal  assembly 

We  have  constructed  a  GVM  compensated  crystal  assembly  in  the  800  mn  regime.  The 
prototype  assembly  exhibited  increased  useable  photon  generation  efficiency  greater  than  that  of 
standard  SPDC  crystals.  When  GVM  can  be  achieved  in  practice  other  applications  are  enabled; 
one  such  application  is  known  as  the  frequency  correlated  state  (FCS).  In  FCS  the  photon  pairs 
are  always  detected  with  identical  frequency,  although  each  photon  is  broadband.  Several 
applications  have  been  identified  for  use  of  the  FCS  [ErdmannOO,  Wong05].  In  the  case  of  type- 
II  SPDC,  the  GVM  crystal  must  be  made  relatively  long,  since  the  value  of  crystal  length 
determines  the  joint  spectral  width.  Single  crystal  experiments  have  been  perfonned  at  1.55  pm, 
but  none  at  800  mn  for  the  reasons  mentioned,  namely  that  no  such  natural  crystals  exist.  The 
multi-crystal  assembly  offers  an  improvement,  but  to  achieve  results  closer  to  the  theoretical 
GVM  maximum  case  the  number  of  crystal  segments  required  becomes  quite  large.  In 
comparison  approximately  12  alternating  layers  have  been  demonstrated  under  this  project, 
whereas  at  least  50  would  be  needed  for  a  high  fidelity  GVM,  and  greater  than  100  would  be 
needed  for  a  FCS. 

The  next  version  of  the  crystal  will  aim  to  generate  a  joint  spectrum  closer  to  the  theoretical 
maximal  GVM  case.  This  requires  the  alternating  layers  of  fi-BBO  and  calcite  to  be  reduced  in 
thickness  and  increased  in  number.  Calcite  is  brittle  and  soft,  a  more  robust  material  has  been 
chosen  as  our  temporal  compensator,  a-BBO.  This  will  tremendously  increase  the  durability  of 
the  crystal  stack  and  allow  for  a  greater  ease  of  construction.  The  construction  of  this  superlattice 
can  be  applied  towards  any  downconversion  crystal  to  remove  the  need  for  spectral  filtering. 
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Schioedtei  crystal  assembly 

Here  we  have  described  the  initial  work  on  a  new  type-II  SPDC  source  design,  designated 
Schioedtei.  Schioedtei  allows  for  the  generation  of  six  pairs  of  entangled  photons  per  pass  of  the 
pump  laser  through  the  type-II  crystal  assembly.  This  configuration  surpasses  the  typical  single 
entangled  pair  generated  per  pass  found  in  standard  type-II  SPDC  sources.  Useable  photon  rates 
resulting  in  two  and  four  fold  coincidence  events  have  been  observed  from  Schioedtei 
demonstrating  its  feasibility  as  a  source  of  entangled  photons  for  QIP.  The  six  pairs  of  photons 
produced  are  directly  applicable  to  the  generation  of  larger  entangled  states  for  use  in  CSQC. 
The  unique  and  advantageous  features  of  Schioedtei  source  are  (i)  the  production  of  a  more 
compact  experimental  setup  compared  to  conventional  multi-stage  down-conversion 
configurations;  (ii)  generation  of  additional  states  beyond  those  produced  in  standard  SPDC 
sources,  whose  variety  and  number  (iii)  more  easily  facilitates  the  creation  of  higher-order 
entangled  states. 
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7.0  LIST  OF  SYMBOLS,  ABBREVIATIONS,  AND  ACRONYMS 

AdQC:  Adiabatic  Quantum  Computing 

a-BBO:  Alpha  barium  borate 

API:  Application  programming  interface 

AR:  Anti-reflective 

fi-BBO:  Beta  barium  borate 

Cbit:  Classical  bit 

CCD:  Charge  coupled  device 

CCM:  Coincidence  counting  module 

CIRC:  Circle  function 

CPU:  Central  processing  unit 

CSQC:  Cluster  state  quantum  computation 

CUD  A:  Compute  Unified  Device  Architecture 

CW:  Continuous  wave 

DMA:  Direct  Memory  Access 

DPN:  Depolarized  noise 

DRAM:  Direct  random  access  memory 

FCS:  Frequency  correlated  state 

GPU:  Graphics  processor  unit 

GPGPU:  General  purpose  graphics  processor  unit 

GSA:  Grover’s  search  algorithm 

GVM:  Group  velocity  match 

HPCMP:  High  Performance  Computer  Modernization  Program 

HWP:  Half  wave  plate 

IAM:  Inversion  about  the  mean 

JEOM:  Joint  Education  Opportunities  for  Minorities 

MPI:  Message  passing  interface 
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MQC; 

Measurement -based  quantum  computation 

NL; 

Nonlinear 

OWQC; 

One-way  quantum  computation 

PBS; 

Polarizing  beam  splitter 

PK; 

Phase  kickback 

PMF; 

Phase  matching  function 

PTR: 

Photo-thermal  refractive 

QCM: 

Quantum  circuit  model 

QEC; 

Quantum  error  correction 

QIP: 

Quantum  information  processing 

QIS; 

Quantum  information  science 

QPM: 

Quasi-phase  matching 

QSA: 

Quantum  search  algorithm 

Qubit: 

Quantum  bit 

QWP; 

Quarter  wave  plate 

Si-APD; 

Silicon  avalanche  photodiode 

SPDC; 

Spontaneous  parametric  downconversion 

WP; 

Wave  plate 
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